Page MenuHomePhabricator

Q125918173 missing from elastic@codfw
Closed, ResolvedPublic

Description

Reported via https://www.wikidata.org/wiki/Wikidata:Report_a_technical_problem/WDQS_and_Search#Item_not_searchable_by_initial_part_of_Label_or_its_alias

The item Q125918173 is missing from elastic@codfw.

Investigating the corresponding page_change events seem to have been produced to eqiad.cirrussearch.update_pipeline.update.rc0 but have then failed to make it through elasticsearch.

Investigating the error queue we can see that consumer-cloudelastic failed with:

com.fasterxml.jackson.core.exc.InputCoercionException: Numeric value (2153607827) out of range of int (-2147483648 - 2147483647)
 at [Source: UNKNOWN; line: 1, column: 127]
        at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInputCoercion(ParserMinimalBase.java:638)
        at com.fasterxml.jackson.core.base.ParserMinimalBase.reportOverflowInt(ParserMinimalBase.java:607)
        at com.fasterxml.jackson.core.base.ParserBase.convertNumberToInt(ParserBase.java:1036)
        at com.fasterxml.jackson.core.base.ParserBase._parseIntValue(ParserBase.java:960)
        at com.fasterxml.jackson.core.base.ParserBase.getIntValue(ParserBase.java:766)
        at com.ok2c.hc5.json.JsonTokenEventHandlerAdaptor.accept(JsonTokenEventHandlerAdaptor.java:56)
        at com.ok2c.hc5.json.JsonAsyncTokenizer.processData(JsonAsyncTokenizer.java:61)
        at com.ok2c.hc5.json.JsonAsyncTokenizer.consume(JsonAsyncTokenizer.java:88)
        at com.ok2c.hc5.json.http.AbstractJsonEntityConsumer.consume(AbstractJsonEntityConsumer.java:74)
        at com.ok2c.hc5.json.http.AbstractJsonMessageConsumer.consume(AbstractJsonMessageConsumer.java:102)
        at org.wikimedia.discovery.cirrus.updater.common.http.GzipJsonAsyncResponseConsumer.consumeSafely(GzipJsonAsyncResponseConsumer.java:157)
        at org.wikimedia.discovery.cirrus.updater.common.util.GzipInflater.inflatePayload(GzipInflater.java:103)
        at org.wikimedia.discovery.cirrus.updater.common.util.GzipInflater.inflate(GzipInflater.java:78)
        at org.wikimedia.discovery.cirrus.updater.common.http.GzipJsonAsyncResponseConsumer.consume(GzipJsonAsyncResponseConsumer.java:139)
        at org.apache.hc.client5.http.impl.async.HttpAsyncMainClientExec$1.consume(HttpAsyncMainClientExec.java:243)
        at org.apache.hc.core5.http.impl.nio.ClientHttp1StreamHandler.consumeData(ClientHttp1StreamHandler.java:255)
        at org.apache.hc.core5.http.impl.nio.ClientHttp1StreamDuplexer.consumeData(ClientHttp1StreamDuplexer.java:354)
        at org.apache.hc.core5.http.impl.nio.AbstractHttp1StreamDuplexer.onInput(AbstractHttp1StreamDuplexer.java:325)
        at org.apache.hc.core5.http.impl.nio.AbstractHttp1IOEventHandler.inputReady(AbstractHttp1IOEventHandler.java:64)
        at org.apache.hc.core5.http.impl.nio.ClientHttp1IOEventHandler.inputReady(ClientHttp1IOEventHandler.java:41)
        at org.apache.hc.core5.reactor.InternalDataChannel.onIOEvent(InternalDataChannel.java:142)
        at org.apache.hc.core5.reactor.InternalChannel.handleIOEvent(InternalChannel.java:51)
        at org.apache.hc.core5.reactor.SingleCoreIOReactor.processEvents(SingleCoreIOReactor.java:178)
        at org.apache.hc.core5.reactor.SingleCoreIOReactor.doExecute(SingleCoreIOReactor.java:127)
        at org.apache.hc.core5.reactor.AbstractSingleCoreIOReactor.execute(AbstractSingleCoreIOReactor.java:86)
        at org.apache.hc.core5.reactor.IOReactorWorker.run(IOReactorWorker.java:44)
        at java.base/java.lang.Thread.run(Thread.java:

Suggesting that the json parser is willing to parse all integers as java int.

What is not clear from P62377 is that the error queue got populated by the cloudelastic consumer but not the codfw one.

AC:

  • the SUP consumer should be able to parse all integers produced by CirrusSearch
  • understand why the sup consumer@codfw did not produce any errors in its error stream
  • possibly add an alert
  • backfill wikidata in cloudelastic and codfw
    • backfill for codfw started on May 15th 7am UTC: python3 cirrus_reindex.py codfw wikidatawiki backfill 2024-05-07T23:30:00 2024-05-14T16:15:00
    • backfill of cloudelastic done on May 16th using python3 cirrus_reindex.py cloudelastic wikidatawiki backfill 2024-05-07T20:00:00 2024-05-14T22:00:00

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change #1031441 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/deployment-charts@master] cirrus-streaming-updater: fix the error topic

https://gerrit.wikimedia.org/r/1031441

Change #1031441 merged by jenkins-bot:

[operations/deployment-charts@master] cirrus-streaming-updater: fix the error topic

https://gerrit.wikimedia.org/r/1031441

Change #1031522 had a related patch set uploaded (by DCausse; author: DCausse):

[operations/alerts@master] cirrus: add alerts on fetch error rates

https://gerrit.wikimedia.org/r/1031522

dcausse triaged this task as High priority.May 15 2024, 7:42 AM
dcausse updated the task description. (Show Details)

Change #1031522 merged by jenkins-bot:

[operations/alerts@master] cirrus: add alerts on fetch error rates

https://gerrit.wikimedia.org/r/1031522

Gehel claimed this task.