Page MenuHomePhabricator

Fatalmonitor on logstash still includes deprecated channel:wfLogDBError
Closed, ResolvedPublicPRODUCTION ERROR

Description

The filter seems to be (type:mediawiki AND (channel:exception OR channel:wfLogDBError)) OR type:hhvm right now, but AFAIK, channel:wfLogDBError no longer exists, having beeing split into channel:DBConnection, channel:DBQuery (both of which have small amount of noise due to long queries and bad connection pattersn) and channel:DBReplication (which has larges amount of noise as it creates thousands of events every time 1 server has 1.1 seconds of lag). I do not know who changed it or why, but it is not longer in use.

Happily, I think query failures also creates exceptions, so at least some events are monitored. The filter, however, should be either changed or deleted, as it does nothing now.

Event Timeline

jcrespo created this task.May 18 2017, 2:05 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 18 2017, 2:05 PM
demon added a subscriber: demon.Jun 29 2017, 12:01 AM

Dropping this from the saved dashboard should be easy (for someone who knows how to edit these things). Does this mean we can also drop it from our logging channels in InitialiseSettings.php?

I have no idea- you should ask whoever did the change on mediawiki. Also do you really want to delete and not replace it by the others?

demon added a comment.Jun 29 2017, 5:33 PM

If they have their own channels & dashboards, not necessarily...the slow queries & such are mostly noise (to me, in this dashboard) so I don't really want those back...

My suggestion would be to substitute "channel:wfLogDBError" with a filter (out) of "lost connection to X, reconecting" and "Read timeout is reached" which are the ones that are created when there are long running queries, leaving in only real bad or illegal queries (e.g. queries that have syntax errors such as T189191).

The average rate of those in the last week is 20 errors every 3 hours, and I don't see anything there that shouldn't be fixed (aka spam).

DB replication and DBconnection could be left out as they create a lot of spam when a single server is down, or behind on replication, but mediawiki should not have a problem with that.

I can create such a filter for you, and revert it if you do not like it. I am ok if you do not want anything of that, in which case, we should remove the reference to an inexistent channel.

mark added a subscriber: mark.Mar 15 2018, 2:37 PM
Bawolff added a subscriber: Bawolff.
jcrespo triaged this task as High priority.EditedMar 23 2018, 1:54 PM
jcrespo added a subscriber: Anomie.

This can make SQL syntax errors, that could lead to security issues like sql injections, being ignored.

jcrespo added a subscriber: greg.Jun 15 2018, 11:43 AM

@greg I think I need your input here, this is the second time bugs are potentially gone unnoticed because this bug on monitoring filters. While I understand log spam is a concern, I think I can make it work with my suggestions at T165675#4040272, but I don't want to touch anything without RelEng consensus.

demon removed a subscriber: demon.Jun 15 2018, 6:20 PM
mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:10 PM
fgiunchedi moved this task from Inbox to Radar on the observability board.Dec 9 2019, 11:44 AM
greg removed a subscriber: greg.Dec 9 2019, 11:42 PM
Krinkle closed this task as Resolved.Mar 24 2020, 9:17 PM
Krinkle claimed this task.
Krinkle added a subscriber: Krinkle.

This has been fixed afaik. I didn't encounter any references to it as part of T247113 or T233342.

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:40 PM