Per @Catrope, we have reason to believe clients (varnish, and/or browsers) may be caching the startup module longer than they should.
When they deployed a change to a configuration variable embedded in mw.config, it took several hours of slow ramp up for all clients to be running with the new values. The config change in question changed how VisualEditor loaded the HTML data model (e.g. from RESTBase instead of Parsoid).
This was based how the traffic volume changed for those two endpoints.
I propose we run a campaign in WikimediaEvents/EventLogging (or perhaps simpler using statsv) that essentially just reflects back the value of a dummy config var. And then after a week or so we change its value.
If we determine that this is indeed problematic, one potential remedy would be adding must-revalidate to our Cache-Control header for short-lived load.php responses.