Page MenuHomePhabricator

Elasticsearch: illegal longitude value [219.38] for coordinates.coord
Closed, ResolvedPublic

Description

After we upgraded codfw to elasticsearch 2.3.3 we started to see some exceptions during doc updates:

MapperParsingException[failed to parse]; nested: IllegalArgumentException[illegal longitude value [219.38] for coordinates.coord];
        at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:154)
        at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:309)
        at org.elasticsearch.index.shard.IndexShard.prepareIndex(IndexShard.java:580)
        at org.elasticsearch.index.shard.IndexShard.prepareIndexOnPrimary(IndexShard.java:559)
        at org.elasticsearch.action.index.TransportIndexAction.prepareIndexOperationOnPrimary(TransportIndexAction.java:212)
        at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:224)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardUpdateOperation(TransportShardBulkAction.java:389)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:191)
        at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
        at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
        at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
        at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: illegal longitude value [219.38] for coordinates.coord
        at org.elasticsearch.index.mapper.geo.GeoPointFieldMapperLegacy.parse(GeoPointFieldMapperLegacy.java:334)
        at org.elasticsearch.index.mapper.geo.BaseGeoPointFieldMapper.parse(BaseGeoPointFieldMapper.java:484)
        at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:309)
        at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:326)
        at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:252)
        at org.elasticsearch.index.mapper.DocumentParser.parseObjectOrField(DocumentParser.java:306)
        at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:326)
        at org.elasticsearch.index.mapper.DocumentParser.parseObject(DocumentParser.java:252)
        at org.elasticsearch.index.mapper.DocumentParser.parseDocument(DocumentParser.java:122)
        ... 19 more

It happened to [enwiki_content][page][14640471] which seems to be Mars.

Event Timeline

dcausse created this task.May 30 2016, 2:55 PM
Restricted Application added subscribers: Zppix, Aklapper. · View Herald TranscriptMay 30 2016, 2:55 PM
Restricted Application added projects: Discovery, Discovery-Search. · View Herald TranscriptMay 30 2016, 3:03 PM

Change 292026 had a related patch set uploaded (by MaxSem):
Don't index non-Earth coordinates

https://gerrit.wikimedia.org/r/292026

Change 292026 merged by jenkins-bot:
Don't index non-Earth coordinates

https://gerrit.wikimedia.org/r/292026

Change 292045 had a related patch set uploaded (by MaxSem):
Don't index non-Earth coordinates

https://gerrit.wikimedia.org/r/292045

Change 292047 had a related patch set uploaded (by MaxSem):
Don't index non-Earth coordinates

https://gerrit.wikimedia.org/r/292047

Change 292047 abandoned by MaxSem:
Don't index non-Earth coordinates

Reason:
8O

https://gerrit.wikimedia.org/r/292047

Change 292047 restored by MaxSem:
Don't index non-Earth coordinates

https://gerrit.wikimedia.org/r/292047

Change 292047 had a related patch set uploaded (by MaxSem):
Don't index non-Earth coordinates

https://gerrit.wikimedia.org/r/292047

Change 292045 merged by jenkins-bot:
Don't index non-Earth coordinates

https://gerrit.wikimedia.org/r/292045

Change 292047 merged by jenkins-bot:
Don't index non-Earth coordinates

https://gerrit.wikimedia.org/r/292047

Mentioned in SAL [2016-05-31T23:47:19Z] <dereckson@tin> Synchronized php-1.28.0-wmf.4/extensions/GeoData/includes/Hooks.php: Don't index non-Earth coordinates (T136559) (duration: 00m 24s)

Mentioned in SAL [2016-05-31T23:54:54Z] <dereckson@tin> Synchronized php-1.28.0-wmf.3/extensions/GeoData/includes/Hooks.php: Don't index non-Earth coordinates (T136559) (duration: 00m 23s)

I can still see errors, I suppose that this happens when we update non-content data like incoming links, elastic should try to reparse the source from the index and then fail?

We could maybe try to reindex these docs from mysql using forceSearchIndex, unfortunately we have only --fromId/--toId options with this script so it may not be very convenient.
On the other hand there are only 2577 docs affected on enwiki according to this query:

{
        "fields":[],
        "query": {
                "nested" : {
                        "path":"coordinates",
                        "query":{
                                "bool":{
                                        "must_not":[
                                                {"term":{"coordinates.globe":"earth"}}
                                        ],
                                        "filter": {"exists":{"field": "coordinates.globe"}}
                                }
                        }
                }
        }
}

@EBernhardson what do you think? Should we add a new option to forceSearchIndex that accepts a list of ids or is it ok to go for a brute force approach and run forceSearchIndex for all ids?

forceSearchIndex for everything will probably take some time, although it is the easy solution. I'm kinda ambivalent. Would perhaps be nice to be able to take a list of ids but not strictly necessary. I suppose if we think it would only take an hour or two to write (probably) can put together a patch to take the id's to reindex as input. The script is a bit of a mess though so it might not be so easy, in which case a full reindex isn't the end of the world.

MaxSem closed this task as Resolved.Jun 7 2016, 5:18 PM