Page MenuHomePhabricator

SUP consumer stuck in codfw
Closed, ResolvedPublic

Description

SiteInfoMaxPageIdLookup is failing with:

java.lang.RuntimeException: SplitFetcher thread 0 received unexpected exception while polling the records
	at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:165)
	at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:114)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalArgumentException: Missing property '/query/general/max-page-id'
	at org.wikimedia.discovery.cirrus.updater.common.model.JsonPathUtils.getRequiredNode(JsonPathUtils.java:13)
	at org.wikimedia.discovery.cirrus.updater.consumer.sanity.SiteInfoMaxPageIdLookup.parse(SiteInfoMaxPageIdLookup.java:39)
	at org.wikimedia.discovery.cirrus.updater.common.http.MediaWikiHttpClient.lambda$fetch$0(MediaWikiHttpClient.java:80)
	at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:247)
	at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:188)
	at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:162)
	at org.wikimedia.discovery.cirrus.updater.common.http.MediaWikiHttpClient.fetch(MediaWikiHttpClient.java:73)
	at org.wikimedia.discovery.cirrus.updater.common.http.MediaWikiHttpClient.load(MediaWikiHttpClient.java:62)
	at org.wikimedia.discovery.cirrus.updater.consumer.sanity.SiteInfoMaxPageIdLookup.apply(SiteInfoMaxPageIdLookup.java:27)
	at org.wikimedia.discovery.cirrus.updater.consumer.sanity.SiteInfoMaxPageIdLookup.apply(SiteInfoMaxPageIdLookup.java:15)
	at org.wikimedia.discovery.cirrus.updater.consumer.sanity.SaneitizeLoop.refreshMaxPageId(SaneitizeLoop.java:255)
	at org.wikimedia.discovery.cirrus.updater.consumer.sanity.SaneitizeLoop.next(SaneitizeLoop.java:141)
	at org.wikimedia.discovery.cirrus.updater.consumer.sanity.SanitySourceSplitReader.lambda$fetch$1(SanitySourceSplitReader.java:65)
	at java.base/java.util.HashMap.forEach(HashMap.java:1337)
	at org.wikimedia.discovery.cirrus.updater.consumer.sanity.SanitySourceSplitReader.fetch(SanitySourceSplitReader.java:63)
	at org.wikimedia.discovery.cirrus.updater.consumer.sanity.SanitySourceSplitReader.fetch(SanitySourceSplitReader.java:57)
	at org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:58)
	at org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:162)
	... 6 more

AC:

  • understand the cause and fix it

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-operations) [2025-03-01T18:30:49Z] <dcausse> disabling the saneitizer on the cirrus streaming updater in codfw (hotfix for T387625)

Mentioned in SAL (#wikimedia-operations) [2025-03-01T18:37:52Z] <dcausse> disabling the saneitizer on the cirrus streaming updater for consumer-search@eqiad & consumer-cloudelastic (pre-emptive hotfix for T387625)

Current hypothesis is that one wiki is returning a response that the job does not understand for /w/api.php?action=query&meta=siteinfo&format=json&formatversion=2 (missing /query/general/max-page-id entry).
Disabling the saneitizer helped to unblock the job.

dcausse triaged this task as High priority.Mar 3 2025, 12:41 PM

Looping over all the wikis there's a single private wiki returning an HTTP 200 with:

{'error': {'code': 'readapidenied', 'info': 'You need read permission to use this module.', 'docref': 'See https://arbcom-ru.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at &lt;https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/&gt; for notice of API deprecations and breaking changes.'}, 'servedby': 'mw-api-int.codfw.main-6f4777f4f9-8pwrt'}

The reason is quite unclear yet, this wiki is not new and does seem to have had its config changed recently.
This problem is now affecting the producer causing it to fail.

The readapidenied error happened because an admin on a private wiki did block the Cirrus_Streaming_Updater user. Un-blocking this user did fix the issue. We might want to make the pipeline more robust to this with https://gitlab.wikimedia.org/repos/search-platform/cirrus-streaming-updater/-/merge_requests/173 and we could consider filing a feature request to make some users unblockable to prevent such mistakes in the future.

EBernhardson subscribed.

Latest version of SUP released and deployed for all consumers and producers.