Page MenuHomePhabricator

Create new cache-friendly lua/parser function for "is today before X date" and "is today after X date"
Open, Needs TriagePublic

Description

When trying to improve parser cache hit rates, we discovered that there are a lot of pages which expire at midnight (UTC or local) every night, due to pages which reference the YYMMDD date. This causes a big drop in cache hit rate every night (~20%). Looking at the level raw request rate (hit+miss) we can clarify that this isn't a nightly spider.

This could be solved partially by selective update (T373258), which could track which fragment was responsible for the reduced cache rate and just recompute that fragment.

But at the same time we could give wikitext/scribunto better date comparison tools.  "Is this date in the past" is a comparison that, once true, will never not be true and shouldn't trigger further daily recomputes. (Improved templates could also be part of the cross-wiki code collab pilot.)

https://en.wikipedia.org/wiki/Module:Article_history/config#L-915 through line 1028 is what this looks like in practice currently. The call to getYmdDate at the top is what is causing the entire page to expire at midnight. [[Module:Article_history]] alone is used on 51,000 pages on enwiki.

The existing date code is just in mw.language.formatDate shipped with scribunto, and there is also {{CURRENTDATE}} in the parser: we should ship both a Lua version of this new function and a wikitext version.

We will probably also need to provide ways to identify pages which are currently cache-unfriendly and motivate conversion to the new parser function. Using a "Expires daily" tracking category might be a good start, but it might have too many pages in it.

As a straw proposal: a date-comparison function with an implicit "today", so that isTodayBefore(<2 weeks from now>) could intelligently set an expiry of <2 weeks from now> instead of <when today changes>. An optional second parameter would select "wiki local" or "utc" time. Similar isTodayAfter(<2 weeks from now>) and isTodayEqual(<future date>) can be provided, although they can be emulated by the user from the base function without loss of efficiency.

(I was also a little surprised to see these functions on enwiki using the UTC time instead of the "local" time, so the front page turns over with the UTC date.  Maybe that makes sense given the global scope of enwiki.  But for itwiki (eg) I'd expect that using an italian local timezone for "article of the day" would make more sense.)

Some additional discussion of the performance impact is in T416540#11592069. Here are some 30-day graphs to quantify the effect, and you can see the sharp daily drops in cache hit rate:

image.png (549×931 px, 127 KB)
image.png (549×931 px, 130 KB)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@cscott: Please add project tags to tasks, so other people can find tasks when searching via projects or looking at workboards. Thanks.

cscott renamed this task from Create new cache-friendly lua/parser function for "is date before X" and "is date after X" to Create new cache-friendly lua/parser function for "is today before X date" and "is today after X date".Feb 5 2026, 4:36 PM
cscott updated the task description. (Show Details)

This is a fantastic idea!

In addition to isBefore and isAfter (preferring these over isTodayBefore to allow for finer-than-daily resolution), maybe a timeSince() to cover use-cases like calculating a person's age. Perhaps:

dalaiLamaBirthdate = os.time({year=1935, month=7, day=6, hour=0})

totalYears = timeSince(dalaiLamaBirthdate, {"years"})
--> {years = 90}

totalDays = timeSince(dalaiLamaBirthdate, {"days"})
--> {days = 33093}

yearsMonthsDays = timeSince(dalaiLamaBirthdate, {"years", "months", "days"})
--> {years = 90, months = 7, days = 8}

Expiry is calculated by predicting when the value will change, based on the smallest requested time unit.

Do we know why this became a problem recently?

I couldn't find it in the task description. I get why it makes sense as an optimisation in theory, but it makes equal sense to me why it might be insignificant in light of

  1. CDN cache being unshortened and thus not expiring near midnight,
  2. ParserCache expiry having a fudge factor that spreads out cache misses,
  3. PoolCounter fast stale,
  4. natural demand across a long tail of different articles not all coinciding.

Example: https://en.wikipedia.org/wiki/António_José_Seguro (linked from today's Main Page)

The infobox contains an "age" that presumably uses a current-time magic word with day-level accuracy where ParserCache expires near the next midnight.

NewPP limit report
Parsed by mw‐web.eqiad.main‐5c9fc7f974‐gzmn2
Cached time: 2026-02-13T17:24:49
Cache expiry: 1802
Reduced expiry: true
Complications: [vary‐revision‐sha1, show‐toc]
…
Saved in parser cache with key enwiki:pcache:35591042:|#|:idhash:canonical and timestamp 20260213172449 and revision id 1338174486. Rendering was triggered because: page_view

I'm not sure where the 1802 expiry comes from. Assuming CoreMagicVariables::applyCacheExpiry, there's a bit of fudging such as +1 from DEADLINE_TTL_CLOCK_FUDGE, and give or take +4 from DEADLINE_TTL_STAGGER_MAX=15; $stagger = strtotime('2026-02-13T17:24:49Z') = int(1771003489); 1771003489 % 15 = 4;, which sets it to expire some ~6 minutes from the top of the hour. I'm guessing there's either a different implementation elsewhere, or additional fudging I've missed. And I guess this means the implementation in question goes by hourly precision for some reason.

Anyway, the key bit is this:

krinkle@mw-experimental:~$ curl --connect-to ::$HOSTNAME:4456 -I 'https://en.wikipedia.org/wiki/Ant%C3%B3nio_Jos%C3%A9_Seguro'
HTTP/1.1 200 OK
date: Fri, 13 Feb 2026 19:10:37 GMT
server: mw-experimental.eqiad.pinkllama-8967448c6-rbgsl
x-powered-by: PHP/8.3.29
…
cache-control: s-maxage=86400, must-revalidate, max-age=0

And this, from Grafana: ParserCache dashboard:

Screenshot 2026-02-13 at 18.50.04.png (1×2 px, 439 KB)

Which suggests something changed to make these peaks more severe. I recall there were recent changes in Parser and Parsoid relating to improving custom TTLs. Maybe those were all good and correct, but those changes may be worth summarising in the task description for context.

Example: https://en.wikipedia.org/wiki/António_José_Seguro (linked from today's Main Page)

The infobox contains an "age" that presumably uses a current-time magic word with day-level accuracy where ParserCache expires near the next midnight.

From my test, it's the presence of {{birth date and age|1962|3|11|df=y}} and

<ref>{{cite web |title=António José Seguro, Assembleia da República |url=https://www.parlamento.pt/DeputadoGP/Paginas/Biografia.aspx?BID=155 |access-date=5 February 2025}}</ref><ref>{{cite web |title=As legislaturas da Assembleia da República |url=https://www.parlamento.pt/Parlamento/Paginas/dias-democracia-art4.aspx |access-date=5 February 2025}}</ref>

that reduces the cache expiry to 1800. A bare Infobox without them has the normal 2592000 (30 days) expiry.

The {{cite web}} case probably applies to preview mode only, due to detection for preview mode in Module:Citation/CS1. However, even in normal reading mode, CS1 still lowers the cache expiry below 86400 s (not caused by access-date validation). The culprit is is_valid_year() in Module:Citation/CS1/Date validation, which calls os.date("%Y"). Some other validations in Module:Citation/CS1/Identifiers may also come into play elsewhere.

The background probably has been described at T416540#11591871. For the legacy parser, the concerned patches should be ParserFunctions #1201838 and Scribunto #1201829.

@Bewfip is correct. There was also a change to the core magic words as far back as T320668 in March 2023.

@Krinkle what made the spikes more severe is that we fixed bugs in Parsoid and core which were causing them to ignore custom TTLs entirely (eg T408741: The functionality for "days left until" is not working correctly with parsoid) and once we started actually honoring the TTLs generated by the parser/scribunto/parser functions we started seeing the sharp nightly spikes.

As a transition strategy, I'd be in favor of removing all time-based TTLs from wikitext, Extension:ParserFunctions and Extension:Scribunto *except* for the new time-based functions. So essentially if you wanted your "days since" or "age" to be accurate, you should use the new functions, otherwise the page expires whenever it expires (14 days).

In T416616, @cscott wrote:

We will probably also need to provide ways to identify pages which are currently cache-unfriendly and motivate conversion to the new parser function. Using a "Expires daily" tracking category might be a good start, but it might have too many pages in it.

I'm probably misunderstanding your suggestion, but: is tagging the articles themselves going to be useful? Aren't most problem pages just relying on common templates that use a few core Lua modules for the date?

Because date access happens through specific interfaces, maybe code search is adequate.
https://global-search.toolforge.org/?q=getDate&namespaces=828&title=

I guess it doesn't tell you which are the really popular ones.

As a transition strategy, I'd be in favor of removing all time-based TTLs from wikitext, Extension:ParserFunctions and Extension:Scribunto *except* for the new time-based functions. So essentially if you wanted your "days since" or "age" to be accurate, you should use the new functions, otherwise the page expires whenever it expires (14 days).

+1

A tracking category is too coarse-grained to be useful (when widely used templates like CS1 trigger this issue). It had better be lint-like to allow editors to locate the template and do possible rewrites.

An alternative would be expanding NewPP limit report's "Reduced expiry" field to include the name of those templates or modules (or the bare magic words).

Change #1239867 had a related patch set uploaded (by C. Scott Ananian; author: Ori):

[mediawiki/extensions/Scribunto@master] Add mw.date library for cache-friendly date comparisons

https://gerrit.wikimedia.org/r/1239867

I think this issue is fairly widespread, judging from the impact it has on the cache rate, which is why I think that either a linting approach is best. Even with the new time-before/time-after functions, I feel like pages which have short timeouts (diffFromToday in units of day or smaller) should be added to some sort of category or lint so the size of that group can be monitored.

I saw usage of math.randomseed(), where the timestamp is used to seed the PRNG. Should there be a way to obtain the time without parser cache side effect?

I saw usage of math.randomseed(), where the timestamp is used to seed the PRNG. Should there be a way to obtain the time without parser cache side effect?

I can't think of a reason you'd want to seed the PRNG with the current date except to implement a "random foo of the day" feature, in which case you explicitly want the expiry.

[…] Grafana: ParserCache dashboard:

Screenshot 2026-02-13 at 18.50.04.png (1×2 px, 439 KB)

[..] what made the spikes more severe is that we fixed bugs in Parsoid and core which were causing them to ignore custom TTLs entirely (eg T408741: The functionality for "days left until" is not working correctly with parsoid) [..]

If these fixes made Parsoid match behaviour of the legacy Parser, then we would have already had a low cache hit rate on the legacy Parser, but historical data shows that isn't the case. It was fine before the change. Plus, the graph above is specifically pcache which is just the legacy Parser. It doesn't include Parsoid (parsoid_pcache).

If the legacy Parser ignored TTLs for time functions, as the commit messages of change 1201767 says, we would have more bug reports about articles with stale content.

If the change mainly caused expiries around midnight, as the task description suggests, we would see 1) articles with their cache expiry offset from the next midnight, and 2) large spikes near midnight. Instead, we see the regression started a high cache miss rate spread evenly throughout the day, and anecdotally (comments above) we see articles have low TTLs like 30 minutes on enwiki for biographies, far from any midnight.

Together with @ArielGlenn I spent an hour today trying to run down some theories and understand what (if anything) regressed to cause this.

1802 seconds

NOTE: 💡 w:António José Seguro has a cache expiry of exactly 1802 seconds.

Using the António article as an example, it consistently parses with a custom TTL of 1802 seconds exactly. It is this amount regardless of time of day, purge after purge. No offset to midnight. No offset to next hour. There is also no fudge to spread it across a range of 15 seconds.

Cached time: 20260217223404
Cache expiry: 1802
Reduced expiry: true
…
Saved in parser cache with key enwiki:pcache:35591042:|#|:idhash:canonical and timestamp 20260217223404 and revision id 1338332765. Rendering was triggered because: page_view

We forked the article to https://en.wikipedia.org/wiki/User:Krinkle/Sandbox and observed the following from edits:

  • Copied with no changes (except removing categories). Cache expiry: 1802. Issue is reproducible.
  • Reduced to plain text with no magic words or Lua calls. Cache expiry: 2592000. 30 days, as expected. The system is not completely broken.
  • Reduced to {{birth date and age|1962|3|11|df=y}}. Cache expiry: 1802. Getting closer.
  • Swap for {{CURRENTDAY}}-{{CURRENTMONTH}}-{{CURRENTYEAR}}. In theory, this is the kind of information the template should be using (day-level accuracy). Cache expiry: 17540. As expected. Great!

The question remains: Why is Age reducing cache expiry to 30 minutes? (1802 seconds), instead of allowing it to be many hours until midnight?

We switch gears and continue the investigation on localhost (plain MW core via Quickstart, with just Cite, ParserFunctions, and Scribunto). I export the my sandbox page, containing only {{birth date and age|1962|3|11|df=y}}, via Special:Export and import it locally via Special:Import. Going local means we can instrument the PHP code. For example, to look at which magic words and Lua functions are called (instead of guessing by analyzing template wikitext and Lua modules).

capture.png (1×850 px, 147 KB)

Post-import we're dealing with only four pages:

Local observations:

  • Post-import we get Cache expiry: 1802, same as prod. The issue is reproducible on stock MediaWiki with default settings. This narrows things down by a lot. It rules out any influence from hundreds of other extensions, all of wmf-config including Parser/ParserCache-related settings, php-luasandbox (vs LuaStandalone), and much more.
  • Swap for {{CURRENTDAY}}-{{CURRENTMONTH}}-{{CURRENTYEAR}}. Cache time is offset from next midnight, as it should be, and matching prod.
  • Swap for {{CURRENTTIME}} (mw:Help:Magic_words). This prints clock time hour and minutes, e.g. 19:22. The cache expiry is 3600.
NOTE: 💡 Why does the most precise magic word (hour/minutes) trigger a fixed expiry of 1 hour, but the "Age" template triggers a fixed expiry of 30 minutes, despite needing much less precision?
Sidebar: Is there disagreement in the system over what the minimum cache expiry should be?

Maybe. At glance, you get 1 hour when you embed high precision time (minutes or seconds), our unknown friend gets 1802s seconds, and when you ask for low-precision time (hour or day) you get expiry until that rolls over, which is usually several days/hours, but can sometimes be a few minutes if you happen to be near the end of the current window. This seems reasonable to me. Consider this:

On a popular page, {{CURRENTHOUR}} will expiry at the top of the hour. This will regenerate shortly after the hour starts, and then last a hour, and repeats. Great. On an unpopular page, we get the same functional outcome, but with a technically shorter TTL that does not perform worse, because it merely starts later into the same hour. It doesn't actually expire more often.

(End of sidebar)

At this point, our theories and questions are:

  • Maybe the Lua code is applying TTLs wrong, whereas wikitext magic words are applied correctly?
  • Perhaps by coincidence a popular enwiki template was edited around the same time in Nov 2025, in a way that causes Lua to think that it needs more precision than it actually does?
  • Why does the 1802 expiry specifically get chosen?

"Date" Lua module

We stripped Module:Date down to the following:

Module:Date_stripped
ocal function main()
  return os.date('!*t').year
end

return {
 main = main
}
{{#invoke:date_stripped|main}}
Cache expiry: 86400
Reduced expiry: false

Great, 24 hours. Calling os.date() does not plummet the cache expiry unconditionally, there must be some optimizations. In fact, we will find there are two optimisations. (Foreshadowing!)

  • return os.date('!*t').day yields Cache expiry: 7671, an offset to midnight (at 21:12).
  • return os.date('!*t').hour yields Cache expiry: 2870, an offset to the next whole hour (at :12)
  • local d = os.date('!*t') return d.hour .. ':' .. d.min yields Cache expiry: 1814, 1803, 1808, varying around a 15 second fudge.
local function main()
  local d = os.date('!*t')
  local x = nil
  x = d.year
  x = d.month
  x = d.day
  x = d.hour
  x = d.min
  x = d.sec

  return 'hi'
end

Cache expiry: 1802, every time, which we narrowed down to os.date('!*t').sec.

os.date() wrapper

Digging into the code, that brings us to Scribunto: LuaCommon/lualib/mw.lua

NOTE: 💡 When using os.date().sec in Lua, the cache expiry is always 1802. When using hour, day, or more, you get the offset you expect plus a fudge.
Scribunto: LuaCommon/lualib/mw.lua
local function wrapDateTable( now )
	return setmetatable( {}, {
		__index = function( t, k )
			if k == 'sec' then
				php.setTTL( 1 )
			elseif k == 'min' then
				php.setTTL( 60 - now.sec )
			elseif k == 'hour' then
				php.setTTL( 3600 - now.min * 60 - now.sec )
			elseif now[k] ~= nil then
				php.setTTL( 86400 - now.hour * 3600 - now.min * 60 - now.sec )

This looks funny because the code in the enwiki Date template is almost the same, using a similar empty object with a lazy-loading getter to populate the fields (enwiki: Module:Date)

enwiki: Module:Date
local current = setmetatable({}, {
	__index = function (self, key)
		local d = os.date('!*t')
		self.year = d.year
		self.month = d.month
		self.day = d.day
		self.hour = d.hour
		self.minute = d.min
		self.second = d.sec
		return rawget(self, key)
	end })

php.setTTL( 1 ) is to LuaEngine::setTTL which calls the familiar CoreMagicVariables::applyCacheExpiry method. There we have the 30-minute minimum, which translates to 1800 seconds. And, because Scribunto calls this a constant 1 as both the ttl and the stagger source, it adds a fudge of exactly 2s. That's one mystery solved!

Questions:

  • How did this work before the November changes?
  • Why 30 minutes instead of 1 hour?

We examined the Novermber changes, specifically Scribunto change 1201829 ("Remove use of Frame::setTTL. Replace with CoreMagicVariables::applyCacheExpiry."). That change is linked to MediaWiki change 1202219 which exposed the magic word logic as CoreMagicVariables::applyCacheExpiry, and sets a 30-minute minimum TTL. There was previously a 15-second minimum there, but it wasn't exposed in practice as a constant. It was only used when there is a higher conceptual TTL, such to the next whole hour for "current hour", when near the end of an hour, to not be less than that. It wasn't used as the deterministic TTL for any given input like we're seeing with os.date().sec and 1802s today.

We reverted my local install back to Nov 2025, before these changes:

  • MediaWiki core 640e84d7578755f45cf569fe41360e07c503b538 (before change 1201767
  • Scribunto 8c1fca9d6ef0e577f5afe0af326882bd5894f9de (before change 1201829)

This yields a cache expiry of 86400. When we import the full article, and set $wgParserCacheExpireTime = 60 * 60 * 24 * 30; it goes to 2592000 even, so indeed it wasn't working when called from Lua.

Last questions:

  • Why does the os.date() wrapper optimization not work?
  • Why did it get so much worse?
  • Is there widespread misuse of these in Lua on-wiki, or do people generally use it correctly?

os.date() wrapper optimization

The code in Scribunto: LuaCommon/lualib/mw.lua returns an empty object that lazily returns individual fields such that if you only read hour or day it will adjust the cache TTL accordingly (i.e. until the next hour or day).

But in the case of enwiki's Module:Date (via Module:Age) this is penalizing the page as if it read os.date().sec. The code in Module:Date creates a very similar lazy object:

enwiki: Module:Date
local current = setmetatable({}, {
	__index = function (self, key)
		local d = os.date('!*t')
		self.year = d.year
		self.month = d.month
		self.day = d.day
		self.hour = d.hour
		self.minute = d.min
		self.second = d.sec
		return rawget(self, key)
	end })

function Date(...) 
	-- …
	elseif argtype == 'currentdate' or argtype == 'currentdatetime' then
		newdate.partial = nil
		newdate.year = current.year
		newdate.month = current.month
		newdate.day = current.day

local function bda(frame)
	-- …
	local diff = Date('currentdate') - date

This is redundant, because Scribunto does that already. But, moreover, it lazily populates a full object, not individual fields. This means while it computes nothing until current.day is accessed, it then triggers all properties, including the costly d.sec.

This wasn't always the case. Module:Age was edited to adopt Module:Date, and in doing so switched from a current object containing only year/month/day to a version that included hour/min/sec. I confirmed locally that the above wrapper is redundant, and that without it, the article has the correct expiry.

Before:

{{birth date and age|1962|3|11|df=y}}
Cache expiry: 1802

After:

Module:Age
-local current = setmetatable({}, {
-	__index = function (self, key)
-		local d = os.date('!*t')
-		self.year = d.year
-		self.month = d.month
-		self.day = d.day
-		self.hour = d.hour
-		self.minute = d.min
-		self.second = d.sec
-		return rawget(self, key)
-	end })
+local current = os.date('!*t')
Cached time: 20260218204531
Cache expiry: 11684

Success! (This is 3.2h, which was the offset to midnight; instead of 30 minutes / 1802s)

I applied the fix to enwiki in this edit, purged w:António José Seguro, and confirmed it now has a cache expiry of Cache expiry: 83775 which is 23.2 hours, until the next midnight.

Mentioned in SAL (#wikimedia-operations) [2026-02-18T00:58:21Z] <Krinkle> Edit Module:Date on various wikis in attempt to mitigate T416616, T416540. Details at https://phabricator.wikimedia.org/T416616#11625838.

I can't think of a reason you'd want to seed the PRNG with the current date except to implement a "random foo of the day" feature, in which case you explicitly want the expiry.

It is common practice to seed the PRNG with the current timestamp when it's intended to be random-like (see the linked document's example; note that mw.lua seems to set the seed to 1 by default). That does not mean the result should expire earlier.

Change #1240300 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] Parser: Raise minimum TTL from 30 min to 'next midnight' in miser mode

https://gerrit.wikimedia.org/r/1240300

To answer @Krinkle's sidebar: yes, there was some disagreement about what the minimum allowable cache expiry is, or should be. The tradeoff being that if you have a "safe" expiration like "every year", very occasionally you'll regenerate that page just before the turn of the year, and do you want to arbitrarily increase the cache expiration in that case, just because you were "unlucky" in when it was parsed?

As you note, the original "minimum expiry" from wikitext constructs was 15 seconds, which seemed far too short. And of course Lua had a bunch of cases of setTTL(1) which I can only hope never actually did anything.

As a compromise, when I landed 1202219 I settled on a 30 minute minimum as a reasonable compromise that would still 'on average' give precise expiration times for "expires on the hour" templates, which seemed like the smallest time unit we would want to actually support.

"Daily updates" is the biggest on-wiki user ("on this day", daily updates to main page, featured articles, etc) so 6 or 12 hours is probably the largest we could set the minimum to.

I think we probably want to distinguish "happens to expire in <short time period> because we were unlucky about when it was rendered" from "expires every <short time period>", and keep a list (lint, tracking category, some other mechanism) of the latter (especially high-page-view instances of the latter) so we can continue to monitor and intervene if necessary.

@cscott Thanks, yeah, there's a couple edge cases that make "next midnight" unappealing, including:

  • local midnight (if the page has a reasonable daily rhythm alinged to the local timezone, extending it to UTC midnight means stale content for most the day). This coud be mitigated by doing both.
  • the unlucky case you describe. I assume this is where parsing starts pre-boundary but lasts into the boundary and so brings it into the full day, which is not great.

I see how you arrive at 30 min as a minimum, which I think is fine as a mininum for dynamic windows that are usually 30 days or 24 hours, but e.g. can sometimes be near a cut. The part where I think it is too short, as we're seeing in the research, is where it essentially becomes a default. But, this is something we control.

Before November, the 15s minimum was afaik never exposed, except through the rhythm if you step into it just before. But the 30min minimum, due to Lua os.time() and os.date().sec mapping to 1s, exposes it a lot, whereas wikitext magic words gave such use cases a fixed 1h expiry, without cutting to nearest hour or midnight (which also functions are a healthy stagger at the same time).

Perhaps that's worth a more sirgical fix to leave the 30min minimum, but change high-precision sources to ask for hour TTL, and thus leave the minimum for those with a longer cycle.

While I didn't write it in the above comment, I very much support the original idea here as well. Given the prominence of these high-resolution things, I don't want next-midnight to become the defacto TTL for most articles all the time.

TODO (T416616): When we provide a "compare point-in-time" function for templates like "birthday and age", we'll need to bypass this so that:

  • Cache can enjoy default/max expiry most of the year.
  • Cache can expire at an earlier time of day (e.g. 1PM in Berlin, or midnight locally in Sydney), so long as it is confined to a specific date.)

I think Ori's patch or something like it, would be great in giving editors the tools for a more precise expiry (including less than the minimum, if we increase it beyond 30min) behind the carrot that they provide a single exact moment, which should safe us from scenarios where something accidentally churns every hour or day. The common birthday/age usecase is like that where, as long as you give us the date, we can cache 30 days most of the year, and then count down to it.

My worry is, given that we now know how widespread unintentional use of exact times is (i.e. not actually relied on in the output, but just as a way of keeping wikitext or Lua code simple and triggering it), adoption may be limited, and even if it is adopted a lot, it only takes 1 template to bring it down to 30 minutes.

Hence, I think we should also increase the default to midnight(s) and/or take away cache precision from functions that just return seconds/minutes and leave the 30min minium for dynamic cases only where it won't return that all the time but only when near an otherwise reasonable boundary.

I think we're on the same page here. I like your patch as a short term fix to prevent expirations less than daily. But we're seeing significant swings at midnight UTC (even though the option is to use local times, the dip seems to be correlated to midnight UTC, which I believe is also the 'local time' on enwiki, perhaps not coincidentally), of up to 20% of the cache hit rate, so I think we do need to drive those daily dips lower as well.

I think ultimately we're going to have to surface the list of pages with short expiry times in some way visible to wiki gnomes, who can then help us keep the size of that list to a reasonably small number.

I think ultimately we're going to have to surface the list of pages with short expiry times in some way visible to wiki gnomes, who can then help us keep the size of that list to a reasonably small number.

A single bug like the one in Module:Date will cause that list to balloon massively, and it won't assist with root-cause analysis.

Counterproposal: What if CoreMagicVariables::applyCacheExpiry() took an optional third argument, $source, that the caller can use to specify where the expiry is coming from? In the case of Scribunto, it might be the name of the Lua module currently being evaluated. The $source of the most-constraining (lowest) TTL would then be included in the NewPP limit report embedded in the HTML output.

Change #1240299 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] Parser: use DI for wgMiserMode and NamespaceInfo in CoreParserFunctions

https://gerrit.wikimedia.org/r/1240299

Change #1240299 merged by jenkins-bot:

[mediawiki/core@master] Parser: use DI for wgMiserMode and NamespaceInfo in CoreParserFunctions

https://gerrit.wikimedia.org/r/1240299

To answer @Krinkle's sidebar: yes, there was some disagreement about what the minimum allowable cache expiry is, or should be. The tradeoff being that if you have a "safe" expiration like "every year", very occasionally you'll regenerate that page just before the turn of the year, and do you want to arbitrarily increase the cache expiration in that case, just because you were "unlucky" in when it was parsed? […]

As part of change 1240300 I've thought through various edge case scenarios and I'm unable to come up with a scenario where an unlucky parse would wrongly store results for a day (or a year) with my implementation. The closest I could come up with is the scenario that justifies the clockskew constant (see inline comment), but even there the worst case scenario is another 30min TTL, just as if it was genuinely parsed before midnight. The assumption I'm making is that $parser->getParseTime() is set before parsing begins and that "everything" consistently uses that.

As I'm writing this, I realize that Lua might not use that clock. I couldn't find any code where we pass the MW time into the Lua environment. We do have an os.date() wrapper for ParserCache adjustments, and a os.clock() wrapper to reduce time accuracy. The latter implies the clock is not frozen. On the other hand, that's a cpu clock, not wall clock. Anyway, a newer value would be fine. I's only a problem if it can go backwards. For example: Start the parse cleanly after midnight, but midway develop a new negative skew specifically in the Lua thread. I suppose with an NTP update that could happen at the OS-level. I don't know if that would apply to a running process, though? If not, then php-luasandbox should be fine. LuaStandalone uses sub processes, so that might be affected if so. In any case, that seems fixable, if we wanted to. (I had to draw the line somewhere, and I stopped digging here.)

Even if an edge case, let's say we have a stale cache in even 1% of the time. Is that really a problem? We always have out of sync when for example a template gets updated and refreshlinks hasn't reached it yet which sometimes take even weeks. In the mean time, many pages naturally get refreshed through unrelated changes such as other templates getting refreshed, wikidata changes propagating, edits, manual purges, etc. I don't think it's really a problem.

Getting back to the initial proposal in the task description: I'm realizing editors won't adopt these interfaces unless the calendar arithmetic is exactly consistent with what existing modules provide. I looked at Module:Date and Module:Age on enwiki in detail, but I'd like to do a broader survey. Here's my proposal:

  • Land changes 1240461 and 1241053. This will embed the name of the module responsible for reducing a page's TTL directly in the article HTML in the parser debug data.
  • Wait for them to roll out.
  • Sample random articles from a cross-section of wikis: zhwiki, jawiki, arwiki, fawiki, kowiki, viwiki, ruwiki, hewiki, thwiki, and hiwiki. (Selected for size and likelihood of having different date semantics.)
  • Investigate which modules constrain TTLs across these wikis and share the results.

@ori sounds good. I wonder if we can be cleverer with Parser: Raise minimum TTL from 30 min to 'next midnight' in miser mode (1240300) · Gerrit Code Review and enforce miser mode at the ParserOutput::getCacheExpiry() level, so that the ParserOutput itself still retains the "exact" minimum cache expiry requested. That would prevent 1240300 from interfering with the data collection from rGPAR1240461fb5f1.

I updated the miser mode patch as described in the previous comment.

In the reviews for https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Scribunto/+/1239867 @ori asked me to describe my thoughts on the core-vs-extension split here.

We currently have date-sensitive functions in three places:

  1. Core: {{currentmonth...}} {{currentday...}} {{currentyear}} {{currenthour}} {{currentweek}} {{currentdow}} {{localmonth...}} {{localday...}} {{localyear}} {{localhour}} {{localweek}} {{localdow}}
    • also {{currenttime}} and {{localtime}} which do *not* set a cache expiration
  2. Scribunto:
    • env.os.date sets an appropriate TTL whenever any specific field of the date object is read, including TTL=1 if you read seconds.
    • also env.os.time sets TTL(1) if you ask for the current time; this is *unlike* parser functions {{currenttime}}/{{localtime}} and could perhaps be changed
    • date formatting also sets TTLs if you are formatting the current time.
  3. ParserFunctions:
    • {{#time}} and {{#timel}} and {{#timef}} and {{#timefl}} do similar date formatting as core and scribunto, and set TTLs in the same way

Now obviously it would be great if we could reduce duplication here. Ideally none of the wikitext functions should exist, and you'd use a scribunto module for any date arithmetic.

But there are almost certainly existing pages which are doing comparisons based on {{currentday}} or {{#time}} for which it would be "too much work" to introduce a new Scribunto module just to replace them. So my proposal is to provide at least basic date comparison functionality via a parser function as well, perhaps {{#istodaybefore|<date 1>}} or {{#istodayafter|<date 2>}}. The goal would be to provide *minimal* functionality to replace any existing uses of the {{#time}} or {{current...}} functions without requiring a Scribunto module to be written, and providing cache-safe date comparison functions *seems* like a core functionality to me.

So I'd prefer adding {{#istodaybefore}} and {{#istodayafter}} to core, and exporting the date-comparison/cache-expiration-computation so that it can be reused for the slightly more complete list of methods in @ori's current Scribunto patch. We should be able to add the full set of comparisons from @ori's patch ({{#istodayequal}}, {{#comparetotoday}} and {{#difffromtoday}}) in the future if there is demand without substantially refactoring anything. The goal is to have the basic before/after comparisons possible in wikitext, but still if you're doing anything more complex than that you probably should be using Scribunto. I don't propose adding anything to ParserFunctions.

@cscott Thanks for the explanation. That sounds reasonable.

But there are almost certainly existing pages which are doing comparisons based on {{currentday}} or {{#time}} for which it would be "too much work" to introduce a new Scribunto module just to replace them.

Ack. Having a concrete example would be helpful -- e.g., to cite as motivation in the commit message for the core change.

I updated the miser mode patch as described in the previous comment.

Ooh, thanks for that. Sorry, I missed this comment.

Change #1240300 merged by jenkins-bot:

[mediawiki/core@master] Parser: Raise minimum TTL from 30 min to 'next midnight' in miser mode

https://gerrit.wikimedia.org/r/1240300

Change #1250013 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@wmf/1.46.0-wmf.18] Parser: Raise minimum TTL from 30 min to 'next midnight' in miser mode

https://gerrit.wikimedia.org/r/1250013

Change #1250013 abandoned by C. Scott Ananian:

[mediawiki/core@wmf/1.46.0-wmf.18] Parser: Raise minimum TTL from 30 min to 'next midnight' in miser mode

Reason:

Meant to backport this to wmf.19, not wmf.18

https://gerrit.wikimedia.org/r/1250013

Change #1250015 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@wmf/1.46.0-wmf.19] Parser: Raise minimum TTL from 30 min to 'next midnight' in miser mode

https://gerrit.wikimedia.org/r/1250015

Change #1250015 merged by jenkins-bot:

[mediawiki/core@wmf/1.46.0-wmf.19] Parser: Raise minimum TTL from 30 min to 'next midnight' in miser mode

https://gerrit.wikimedia.org/r/1250015

Mentioned in SAL (#wikimedia-operations) [2026-03-10T20:25:45Z] <jforrester@deploy2002> Started scap sync-world: Backport for [[gerrit:1240012|Enable personal main menu to all users in Minerva Neue skin (T413912)]], [[gerrit:1250007|Enables legacy processing in ParserOutputPostCacheTransform when cached (T372592)]], [[gerrit:1250015|Parser: Raise minimum TTL from 30 min to 'next midnight' in miser mode (T416616 T416540 T419439)]]

Mentioned in SAL (#wikimedia-operations) [2026-03-10T20:27:51Z] <jforrester@deploy2002> jforrester, cscott, bwang: Backport for [[gerrit:1240012|Enable personal main menu to all users in Minerva Neue skin (T413912)]], [[gerrit:1250007|Enables legacy processing in ParserOutputPostCacheTransform when cached (T372592)]], [[gerrit:1250015|Parser: Raise minimum TTL from 30 min to 'next midnight' in miser mode (T416616 T416540 T419439)]] synced to the testservers (see https://wikitech.wi

Mentioned in SAL (#wikimedia-operations) [2026-03-10T20:38:43Z] <jforrester@deploy2002> Finished scap sync-world: Backport for [[gerrit:1240012|Enable personal main menu to all users in Minerva Neue skin (T413912)]], [[gerrit:1250007|Enables legacy processing in ParserOutputPostCacheTransform when cached (T372592)]], [[gerrit:1250015|Parser: Raise minimum TTL from 30 min to 'next midnight' in miser mode (T416616 T416540 T419439)]] (duration: 12m 58s)