Page MenuHomePhabricator

Unreliable message:"something" search since last Elastic upgrade
Open, Needs TriagePublic

Description

By far the most common queries MW developers do on MW-related logstash dashboard (that is, after initial one-time setup for type and channel) is to search for partial substrings of diagnostic messages. Substrings because 1) we are lazy and 2) there may be parts at the start or end of the message that are variable.

This has of course never strictly been a substring match, but so long as the substring start and ends at word boundaries, it has (in my experience) never failed to find what I was looking for.

I assume this is a configuration problem of some kind or maybe default settings changed that we didn't know we were using and may need to set explicitly now. I won't report each of these since I run into several every day. But I've filed one about meta.domain in T260838, and this onen will be for the far more common use case of matching against message:.

As part of T260838, we are investigating errors that involve a message like the following being logged:

message
Error 1213: Deadlock found when trying to get lock; try restarting transaction
Function: MediaWiki\Block\DatabaseBlockStore::insertBlock
Query: INSERT IGNORE INTO `ipblocks` (ipb_address,ipb_user,ipb_timestamp,ipb_auto,ipb_anon_only,ipb_create_account,ipb_enable_autoblock,ipb_expiry,ipb_range_start,ipb_range_end,ipb_deleted,ipb_block_email,ipb_allow_usertalk,ipb_parent_block_id,ipb_sitewide,ipb_reason_id,ipb_by_actor) VALUES (<redacted>)

The most stable and closely correlated part of this value is Function: MediaWiki\Block\DatabaseBlockStore::insertBlock which is a string that nearly uniquely identifies the logical query in MW. Typically one would open up the mediawiki-errors dashboard in Logstash and search for message:"MediaWiki\Block\DatabaseBlockStore::insertBlock". Internally, my naive understanding was that this will probably end up normalizing it to some extend and so it might in theory match results that have each of these "words" as well and in the same order even without the specific casing or special chars, but that's fine.

But, since the upgrade, this returns no results at all. Interestingly, some other variations of the query, however, do return results. For example, searching for:

message:"INSERT IGNORE INTO `ipblocks`"

Does yield results.

https://logstash.wikimedia.org/goto/64abb2f021d20ed31864d50c43c898d7

And these results include:

result
[{reqId}] {exception_url} Wikimedia\Rdbms\DBQueryError: Error 1213: Deadlock found when trying to get lock; try restarting transaction Function: MediaWiki\Block\DatabaseBlockStore::insertBlock Query: INSERT IGNORE INTO `ipblocks` (ipb_addr

Where the same string appears as the one we asked for previously.

Event Timeline

The issue is still reproducible with the given example of multi-line messages that contain something like "Function: Foo\Bar\Baz::quux". This week, there's a number of messages like this:

A connection error occurred during a query. 
Query: SELECT 1 AS ping
Function: Wikimedia\Rdbms\Database::ping
Error: 2006 MySQL server has gone away

When searching on the Logstash: mediawiki dashboard for message:"Function" or message:"Database::ping" (which no human would, but just to prove that the field is findable in at least some form) - these rows are returned. But, when searching for message:"Function: Wikimedia\Rdbms\Database::ping" there are zero results.

The issue is still reproducible with the given example of multi-line messages that contain something like "Function: Foo\Bar\Baz::quux". This week, there's a number of messages like this:

A connection error occurred during a query. 
Query: SELECT 1 AS ping
Function: Wikimedia\Rdbms\Database::ping
Error: 2006 MySQL server has gone away

When searching on the Logstash: mediawiki dashboard for message:"Function" or message:"Database::ping" (which no human would, but just to prove that the field is findable in at least some form) - these rows are returned. But, when searching for message:"Function: Wikimedia\Rdbms\Database::ping" there are zero results.

Usually the special characters (semicolons and slashes) are excluded from the index. For this case the parser may be treating the slashes as escape characters. Does message:"Function: Wikimedia\\Rdbms\\Database::ping" return more helpful results?

Does the previous comment resolve the issue?

@colewhite

  1. Logstash: mediawiki-errors dashboard.
  2. Query message:Function
  3. Pick one, and copy its "Function" line as example.
  4. Query message:"Function: MediaWiki\Extension\DynamicPageList\Hooks::processQuery" — yields no results.
  5. Manually adding in a number of backlashes as message:"Function: MediaWiki\\Extension\\DynamicPageList\\Hooks::processQuery" — yields expected results.

I couldn't find any upstream feature requests for this, however I did look into the DQL option, which is an alternate way to interpret the search, rather than as Lucene query string.

https://opensearch.org/docs/2.9/dashboards/discover/dql/
https://opensearch.org/docs/latest/query-dsl/full-text/query-string/

It look like DQL doesn't require escaping, or at least not this particular kind. In fact, under DQL it finds the same results both as-is and with double backslashes.