Thank you @Mholloway for the quick fix!
I'd second having the polyfill added in general. Perhaps we could put it in a separate file and then include it in all of the translators? IMO, that way we wouldn't have conflicts when rebasing on top of upstream.
Wed, Apr 18
The Parsoid counterpart has already been filed and declined in T192388: enwiki page fails to parse. And yes, the reason there is that the table is too big.
We should do the simple work-around for now because we may need to reevaluate the decision of not pre-generating everything, given the course of action of separating the content into multiple end points and throwing language variants in the mix.
Actually, we cannot move the wikitech jobs as they are currently because of security implications, so declining the ticket.
It turns out Wikitech does not use the Redis-based queue, so we'll defer doing this to later.
Indeed @Mholloway, Parsoid is now declining to parse it, which is no wonder since the page is a huuuge table of exoplanets. That's a bit strange since I saw the error log from MCS for it.
On the back-end side things are looking good. After the initial increase in number of requests that caused Varnish misses, we are back to normal levels, indicating that the majority of requests are served from cache. In terms of errors things are looking good too: no noticeable changes in error rates in parsing/serving summaries.
Tue, Apr 17
Also happens for /en.wikipedia.org/v1/page/summary/List_of_exoplanets_(full) (the same stack trace too).
Looking at the logs, there have been 2000 occurrences of this error for this page in the last 24h, 17k in the last 7 days.
Mon, Apr 16
It took us a while to find the root cause of this. Essentially, the problem was that the ChangeNotification jobs were being created/dispatched via a cron script. In such a case, the EventBus-based JobQueue was assigning the originating wiki (which was Wikidata) instead of the intended recipient wiki. At the same time, whether this job injects RC records or not is configurable per-wiki, and that setting was set to false for WD.
Thank you, @Ottomata !
Fri, Apr 13
Indeed they are not. I opened T192157: Re-render wiktionary definitions on user purges to fix it. Expect the fix for it to be deployed early next week.
We have decided to put it on Ganeti for now, so I'm resolving this task.
Thu, Apr 12
The imminent problem has been dealt with by switching the job back onto the Redis-based queue. Let's keep the ticket open until the TranslateUpdateJob removes PHP serialisation and ports the job to support JSON-serialised parameters.
Wed, Apr 11
For posterity, in today's JobQueue biweekly meeting we agreed that having unencrypted mirroring of private wikis' data is not acceptable. Given that TLS work for EventBus' MirrorMaker is scheduled for next quarter, we have decided to temporarily switch mirroring off until the TLS work is done. In this way, we can proceed with enabling support for private wikis and start decommissioning the Redis machines tied to the old JobQueue transport mechanism.
IMHO, all deployment charts should reside in the same repository / on the same (sub)domain, regardless of whether they are for prod, CI or development use. The different use cases can be namespaced / separated by URL paths. From that perspective, integration.wm.o seems less than ideal.
It's fialing only on deploy1001, so lowering the priority.
It seems that the reimage is now blocking deployments, cf. T191972: Scap sync-file failing for deploy1001.eqiad.wmnet
Apparently deploy1001 has been recently reimaged. However, it doesn't seem like it has a role associated with it at this time (it's not a deployment server otherwise I would have been able to log in). As a consequence, the failing mw hosts reject connections coming from it, as its key has changed.
Tue, Apr 10
The fix has been deployed. Note, however, that the correct content type will start appearing gradually as wiktionary definitions are being updated.
Mon, Apr 9
Fri, Apr 6
Re-prioritising due to the recent occurrence of this.