The item Q125918173 is missing from elastic@codfw.
Investigating the corresponding page_change events seem to have been produced to eqiad.cirrussearch.update_pipeline.update.rc0 but have then failed to make it through elasticsearch.
Investigating the error queue we can see that consumer-cloudelastic failed with:
com.fasterxml.jackson.core.exc.InputCoercionException: Numeric value (2153607827) out of range of int (-2147483648 - 2147483647) at [Source: UNKNOWN; line: 1, column: 127] at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInputCoercion(ParserMinimalBase.java:638) at com.fasterxml.jackson.core.base.ParserMinimalBase.reportOverflowInt(ParserMinimalBase.java:607) at com.fasterxml.jackson.core.base.ParserBase.convertNumberToInt(ParserBase.java:1036) at com.fasterxml.jackson.core.base.ParserBase._parseIntValue(ParserBase.java:960) at com.fasterxml.jackson.core.base.ParserBase.getIntValue(ParserBase.java:766) at com.ok2c.hc5.json.JsonTokenEventHandlerAdaptor.accept(JsonTokenEventHandlerAdaptor.java:56) at com.ok2c.hc5.json.JsonAsyncTokenizer.processData(JsonAsyncTokenizer.java:61) at com.ok2c.hc5.json.JsonAsyncTokenizer.consume(JsonAsyncTokenizer.java:88) at com.ok2c.hc5.json.http.AbstractJsonEntityConsumer.consume(AbstractJsonEntityConsumer.java:74) at com.ok2c.hc5.json.http.AbstractJsonMessageConsumer.consume(AbstractJsonMessageConsumer.java:102) at org.wikimedia.discovery.cirrus.updater.common.http.GzipJsonAsyncResponseConsumer.consumeSafely(GzipJsonAsyncResponseConsumer.java:157) at org.wikimedia.discovery.cirrus.updater.common.util.GzipInflater.inflatePayload(GzipInflater.java:103) at org.wikimedia.discovery.cirrus.updater.common.util.GzipInflater.inflate(GzipInflater.java:78) at org.wikimedia.discovery.cirrus.updater.common.http.GzipJsonAsyncResponseConsumer.consume(GzipJsonAsyncResponseConsumer.java:139) at org.apache.hc.client5.http.impl.async.HttpAsyncMainClientExec$1.consume(HttpAsyncMainClientExec.java:243) at org.apache.hc.core5.http.impl.nio.ClientHttp1StreamHandler.consumeData(ClientHttp1StreamHandler.java:255) at org.apache.hc.core5.http.impl.nio.ClientHttp1StreamDuplexer.consumeData(ClientHttp1StreamDuplexer.java:354) at org.apache.hc.core5.http.impl.nio.AbstractHttp1StreamDuplexer.onInput(AbstractHttp1StreamDuplexer.java:325) at org.apache.hc.core5.http.impl.nio.AbstractHttp1IOEventHandler.inputReady(AbstractHttp1IOEventHandler.java:64) at org.apache.hc.core5.http.impl.nio.ClientHttp1IOEventHandler.inputReady(ClientHttp1IOEventHandler.java:41) at org.apache.hc.core5.reactor.InternalDataChannel.onIOEvent(InternalDataChannel.java:142) at org.apache.hc.core5.reactor.InternalChannel.handleIOEvent(InternalChannel.java:51) at org.apache.hc.core5.reactor.SingleCoreIOReactor.processEvents(SingleCoreIOReactor.java:178) at org.apache.hc.core5.reactor.SingleCoreIOReactor.doExecute(SingleCoreIOReactor.java:127) at org.apache.hc.core5.reactor.AbstractSingleCoreIOReactor.execute(AbstractSingleCoreIOReactor.java:86) at org.apache.hc.core5.reactor.IOReactorWorker.run(IOReactorWorker.java:44) at java.base/java.lang.Thread.run(Thread.java:
Suggesting that the json parser is willing to parse all integers as java int.
What is not clear from P62377 is that the error queue got populated by the cloudelastic consumer but not the codfw one.
AC:
- the SUP consumer should be able to parse all integers produced by CirrusSearch
- understand why the sup consumer@codfw did not produce any errors in its error stream
- the errors were produced to codfw.cirrussearch.update_pipeline.fetch_error instead of codfw.cirrussearch.update_pipeline.fetch_error.rc0 fix is https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1031441
- possibly add an alert
- backfill wikidata in cloudelastic and codfw
- backfill for codfw started on May 15th 7am UTC: python3 cirrus_reindex.py codfw wikidatawiki backfill 2024-05-07T23:30:00 2024-05-14T16:15:00
- backfill of cloudelastic done on May 16th using python3 cirrus_reindex.py cloudelastic wikidatawiki backfill 2024-05-07T20:00:00 2024-05-14T22:00:00