Page MenuHomePhabricator

Database error messages are not useful in logstash
Closed, DeclinedPublicPRODUCTION ERROR

Description

(checked in beta logastash but I imagine the production config is the same)

A DB error results in two events (DBQuery and exception), neither of which actually shows the error message in a readable format.

The raw logstash records are fine:

https://logstash-beta.wmflabs.org/app/kibana#/doc/logstash-*/logstash-2017.02.08/mediawiki/?id=AVogBUK0Ysi3jGukZ3F6

{
  "_index": "logstash-2017.02.08",
  "_type": "mediawiki",
  "_id": "AVogBUK0Ysi3jGukZ3F6",
  "_score": 1,
  "_source": {
    "message": "CentralAuthUser::localUserData\t10.68.18.35\t1054\tUnknown column 'ug_expiry' in 'where clause' (10.68.18.35)\tSELECT  ug_group  FROM `user_groups`    WHERE ug_user = '300' AND (ug_expiry IS NULL OR ug_expiry >= '20170208231819')  ",
    "@version": 1,
    "@timestamp": "2017-02-08T23:18:19.000Z",
    "type": "mediawiki",
    "host": "deployment-mediawiki04",
    "level": "ERROR",
    "tags": [
      "syslog",
      "es",
      "es"
    ],
    "channel": "DBQuery",
    "shard": "s3",
    "normalized_message": "{fname}\t{db_server}\t{errno}\t{error}\t{sql1line}",
    "url": "/w/index.php?title=Special%3ACentralAuth&target=Tgr",
    "ip": "10.68.18.103",
    "http_method": "GET",
    "server": "en.wikipedia.beta.wmflabs.org",
    "referrer": "https://en.wikipedia.beta.wmflabs.org/wiki/Special:CentralAuth",
    "unique_id": "WJunOwpEE4AAAAeY6NEAAAAM",
    "wiki": "enwiki",
    "mwversion": "1.29.0-alpha",
    "reqId": "WJunOwpEE4AAAAeY6NEAAAAM",
    "db_server": "10.68.18.35",
    "db_name": "ee_prototypewiki",
    "db_user": "wikiuser",
    "method": "Database::reportQueryError",
    "errno": 1054,
    "error": "Unknown column 'ug_expiry' in 'where clause' (10.68.18.35)",
    "sql1line": "SELECT  ug_group  FROM `user_groups`    WHERE ug_user = '300' AND (ug_expiry IS NULL OR ug_expiry >= '20170208231819')  ",
    "fname": "CentralAuthUser::localUserData"
  },
  "fields": {
    "@timestamp": [
      1486595899000
    ]
  }
}

https://logstash-beta.wmflabs.org/app/kibana#/doc/logstash-*/logstash-2017.02.08/mediawiki/?id=AVogBUK0Ysi3jGukZ3F7

{
  "_index": "logstash-2017.02.08",
  "_type": "mediawiki",
  "_id": "AVogBUK0Ysi3jGukZ3F7",
  "_score": 1,
  "_source": {
    "message": "A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? \nQuery: SELECT  ug_group  FROM `user_groups`    WHERE ug_user = '300' AND (ug_expiry IS NULL OR ug_expiry >= '20170208231819')  \nFunction: CentralAuthUser::localUserData\nError: 1054 Unknown column 'ug_expiry' in 'where clause' (10.68.18.35)\n",
    "@version": 1,
    "@timestamp": "2017-02-08T23:18:19.000Z",
    "type": "mediawiki",
    "host": "deployment-mediawiki04",
    "level": "ERROR",
    "tags": [
      "syslog",
      "es",
      "es",
      "exception-json"
    ],
    "channel": "exception",
    "shard": "s3",
    "normalized_message": "{\"id\":\"WJunOwpEE4AAAAeY6NEAAAAM\",\"type\":\"DBQueryError\",\"file\":\"/srv/mediawiki/php-master/includes/libs/rdbms/database/Database.php\",\"line\":1057,\"message\":\"A database query error has occurred. Did you forget to run your application's database schema update",
    "url": "/w/index.php?title=Special%3ACentralAuth&target=Tgr",
    "ip": "10.68.18.103",
    "http_method": "GET",
    "server": "en.wikipedia.beta.wmflabs.org",
    "referrer": "https://en.wikipedia.beta.wmflabs.org/wiki/Special:CentralAuth",
    "unique_id": "WJunOwpEE4AAAAeY6NEAAAAM",
    "wiki": "enwiki",
    "mwversion": "1.29.0-alpha",
    "reqId": "WJunOwpEE4AAAAeY6NEAAAAM",
    "private": true,
    "file": "/srv/mediawiki/php-master/includes/libs/rdbms/database/Database.php",
    "line": 1057,
    "code": 0,
    "caught_by": "mwe_handler",
    "backtrace": [ .. ],
    "exception_id": "WJunOwpEE4AAAAeY6NEAAAAM",
    "class": "mediawiki",
    "message_checksum": "abf09740ec54e49a772a9260dc6e1e26"
  },
  "fields": {
    "@timestamp": [
      1486595899000
    ]
  }
}

but then the dashboard just shows A database query error has occurred. Did you forget to run your application's database schema updater after up... and CentralAuthUser::localUserData 10.68.18.35 1054 Unknown column 'ug_expiry' in 'where clause' (10.68.18.35) SEL... respectively. The stack trace does get displayed in full, so this not some software-level kibana limitation.

Event Timeline

@bd808 is string truncation something that can be controlled in logstash/kibana somewhere? We could use LogstashFormatter to break out the DB error message / SQL query into separate fields if those can be configured to be formatted more permissively.

Whatever is happening is happening in the Kibana view layer. I honestly have no idea if the column widths are configurable in any way or not.

Sometimes the message is ridiculously long though. Compare https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2017.02.22/mediawiki/?id=AVpoNELslL3ktRIGdB9S .
Maybe it's based on line length, not string length?

It seems Kibana can truncate strings but is not configured to do that with message: https://logstash.wikimedia.org/app/kibana#/settings/indices/logstash-*/field/message
(There is also truncate:maxHeight which probably controls the fadeout thing in listings, but nothing similar for width.)
The eventlist object does not seem to have any special configuration either.

@Tgr This task is fairly generic, but if reduce to the truncation issue, can you confirm whether that is still an active issue? We've made a fair number of changes to how logging works in the past year.

Krinkle triaged this task as Medium priority.Jul 11 2018, 1:37 AM
Krinkle changed Risk Rating from N/A to default.
mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:10 PM