Page MenuHomePhabricator

Truncate labels.*.near_match fields
Open, Needs TriagePublic2 Estimated Story Points

Description

Seen in logs:

bulk action failed with status BAD_REQUEST: {"index":"testwikidatawiki_content_1756493369","type":"_doc","id":"193822","cause":{"type":"exception","reason":"Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field=\"labels.en.near_match\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[116, ]...', original message: bytes can be at most 32766 in length; got 161999]","caused_by":{"type":"exception","reason":"Elasticsearch exception [type=max_bytes_length_exceeded_exception, reason=max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 161999]"}},"status":400}

AC:

  • wikibase items with very long labels do not fail at index time

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
pfischer set the point value for this task to 2.Oct 27 2025, 4:31 PM

Change #1199046 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] mapping: add truncation to near_match analyzer

https://gerrit.wikimedia.org/r/1199046

Change #1199046 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] mapping: add truncation to near_match analyzer

https://gerrit.wikimedia.org/r/1199046