Page MenuHomePhabricator

Search broken on beta cluster wikis
Closed, ResolvedPublic3 Estimated Story Points

Description

Search is currently broken on Beta Wikimedia Commons:

https://commons.wikimedia.beta.wmflabs.org/w/index.php?search=Test
An error has occurred while searching: We could not complete your search due to a temporary problem. Please try again later.

It’s been broken since at least 20:48 UTC today (failed ACDC browser test run); it was still working at 20:47 yesterday (successful ACDC browser test run).

It's also broken on e.g. cs beta wiki, https://cs.wikipedia.beta.wmflabs.org/w/index.php?search=translating&title=Speci%C3%A1ln%C3%AD%3AHled%C3%A1n%C3%AD&go=J%C3%ADt+na&ns0=1&ns100=1&ns102=1&uselang=en results in An error has occurred while searching: We could not complete your search due to a temporary problem. Please try again later.

Event Timeline

kostajh renamed this task from Search broken on Beta Wikimedia Commons to Search broken on beta cluster wikis.Mar 1 2021, 12:51 PM
kostajh updated the task description. (Show Details)
kostajh added subscribers: Xqt, pywikibot-bugs-list.

Probably the following log helps for Investigation:

headers

VERBOSE  pywiki:logging.py:101            headers=
{'Date': 'Mon, 01 Mar 2021 09:02:51 GMT', 'Server': 'deployment-mediawiki-07.deployment-prep.eqiad1.wikimedia.cloud', 'X-Content-Type-Options': 'nosniff', 'MediaWiki-API-Error': 'cirrussearch-backend-error', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Disposition': 'inline; filename=api-result.json', 'Cache-Control': 'private, must-revalidate, max-age=0', 'Vary': 'Accept-Encoding', 'X-Request-Id': 'YDytu6wQBHcAAAv4eE0AAAAB', 'Content-Type': 'application/json; charset=utf-8', 'Content-Encoding': 'gzip', 'Age': '0', 'X-Cache': 'deployment-cache-text06 pass, deployment-cache-text06 pass', 'X-Cache-Status': 'pass', 'Server-Timing': 'cache;desc="pass"', 'Report-To': '{ "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }', 'NEL': '{ "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}', 'X-Client-IP': '104.154.182.187', 'Accept-Ranges': 'bytes', 'Content-Length': '296', 'Connection': 'keep-alive'}

query

VERBOSE  pywiki:logging.py:101 API Error: query=
("{'gsrsearch': ['intitle:wiki'], 'gsrwhat': [None], 'prop': ['info', "
 "'imageinfo', 'categoryinfo'], 'inprop': ['protection'], 'iiprop': "
 "['timestamp', 'user', 'comment', 'url', 'size', 'sha1', 'metadata'], "
 "'iilimit': ['max'], 'generator': ['search'], 'action': ['query'], "
 "'indexpageids': [True], 'continue': [True], 'gsrnamespace': [0], 'gsrlimit': "
 "['10'], 'meta': ['userinfo'], 'uiprop': ['blockinfo', 'hasmsg'], 'maxlag': "
 "['5'], 'format': ['json']}")

response

VERBOSE  pywiki:logging.py:101            response=
{'error': {'code': 'cirrussearch-backend-error', 'info': 'We could not complete your search due to a temporary problem. Please try again later.', 'help': 'See https://en.wikipedia.beta.wmflabs.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes.'}, 'servedby': 'deployment-mediawiki-07'}

I tried looking for details on logstash-beta.wmflabs.org but that doesn’t seem to have any events at all (T233134, I guess?).

Looking at CirrusSearch.log on beta fluorine, it seems ElasticSearch is unreachable (requests fail with 502 Bad Gateway).

Which is indeed the case:

tgr@deployment-deploy01:~$ curl 'https://deployment-elastic05.deployment-prep.eqiad.wmflabs:9243'
<html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.13.9</center>
</body>
</html>

This will be fixed soonish, looks like a problem with bad elasticsearch plugin release that contains multiple versions of the same plugin, along with T276198

tgr@deployment-elastic05:~$ sudo systemctl status elasticsearch
● elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: enabled)
   Active: inactive (dead) since Mon 2021-03-01 08:34:51 UTC; 1 day 8h ago
     Docs: http://www.elastic.co
 Main PID: 26725 (code=exited, status=1/FAILURE)

Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at java.nio.file.Files.newByteChannel(Files.java:361)
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at java.nio.file.Files.newByteChannel(Files.java:407)
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at java.nio.file.Files.newInputStream(Files.java:152)
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at org.elasticsearch.tools.launchers.JvmOptionsParser.main(JvmOptionsParser.java:60)
Mar 01 08:34:51 deployment-elastic05 systemd[1]: elasticsearch.service: Main process exited, code=exited, status=1/FAILURE
Mar 01 08:34:51 deployment-elastic05 systemd[1]: elasticsearch.service: Unit entered failed state.
Mar 01 08:34:51 deployment-elastic05 systemd[1]: elasticsearch.service: Failed with result 'exit-code'.

tgr@deployment-elastic05:~$ sudo journalctl -u elasticsearch
-- Logs begin at Mon 2021-03-01 07:07:07 UTC, end at Tue 2021-03-02 16:43:54 UTC. --
Mar 01 08:34:51 deployment-elastic05 systemd[1]: Started Elasticsearch.
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]: /usr/share/elasticsearch/bin/elasticsearch-env: line 78: cd: ${ES_PATH_CONF-/etc/elasticsearch}: No such file or directory
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]: Exception in thread "main" java.nio.file.NoSuchFileException: /usr/share/elasticsearch/jvm.options
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at java.nio.file.Files.newByteChannel(Files.java:361)
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at java.nio.file.Files.newByteChannel(Files.java:407)
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at java.nio.file.Files.newInputStream(Files.java:152)
Mar 01 08:34:51 deployment-elastic05 elasticsearch[26725]:         at org.elasticsearch.tools.launchers.JvmOptionsParser.main(JvmOptionsParser.java:60)
Mar 01 08:34:51 deployment-elastic05 systemd[1]: elasticsearch.service: Main process exited, code=exited, status=1/FAILURE
Mar 01 08:34:51 deployment-elastic05 systemd[1]: elasticsearch.service: Unit entered failed state.
Mar 01 08:34:51 deployment-elastic05 systemd[1]: elasticsearch.service: Failed with result 'exit-code'.

Search is functional again.