Make logging work for mediawiki in k8s
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Krinkle
	Aug 13 2021, 6:28 PM

Description

Placeholder task since there doesn't appear to be one yet.

Stuff that comes to mind:

MediaWiki PSR/Monolog logs go to Logstash under type:mediawiki. This is for exception, fatals, and various diagnostic channels.
MediaWiki logs go to mwlog1001 files. (Test plan: XWD api.php queries go only to api.log; fatal-error.php hits go to both Logstash and mwlog.)
php-wmerrors fatal errors go from /etc/php/php7-fatal-error.php to Logstash under type:mediawiki channel:exception caught_by:php-wmerrors.
php-fpm stderr go to Logstash under type:syslog program:php7.2-fpm.

Details

Subject	Repo	Branch	Lines +/-
mediawiki: add handling of php-fpm logs via rsyslogd	operations/deployment-charts	master	+209 -40
profile::mediawiki::php: support kubernetes in php-fatal-error.php	operations/puppet	production	+16 -1
add stack.head field for aggregating events by stack head	operations/software/ecs	master	+18 -0
php-fpm: Allow changing location of the log files	operations/docker-images/production-images	master	+30 -8
kubernetes::deployment_server::mediawiki: add logging configuration	operations/puppet	production	+32 -0
mediawiki: Add rsyslog sidecar	operations/deployment-charts	master	+194 -1
Add rsyslog image	operations/docker-images/production-images	master	+53 -0
mediawiki::web::yaml_defs: inject php7-fatal-error.php in k8s	operations/puppet	production	+10 -1
mediawiki: allow injecting the wmerrors script	operations/deployment-charts	master	+36 -9
Add configuration for wmerrors to php-multiversion-base	operations/docker-images/production-images	master	+11 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Stalled	None	T255792 Quibble runs core:unit tests twice!
Open	None	T328919 Upgrade to PHPUnit 10
Open	None	T338103 Micro-optimize ApiResult::isMetadataKey with str_starts_with once we support PHP8+
Open	None	T328921 Drop PHP 7.4 support from MediaWiki
Stalled	None	T334726 Use return type `never` in Wikibase
Open	None	T328922 Drop PHP 8.0 support from MediaWiki
Stalled	None	T319055 Upgrade to psr/container 2.x
Stalled	Krinkle	T319432 Migrate WMF production from PHP 7.4 to PHP 8.1
Open	None	T291916 Tracking task for Bullseye migrations in production
Stalled	None	T356293 Migrate MW appservers' base images to bullseye
Open	None	T290536 Serve production traffic via Kubernetes
Resolved	Joe	T283056 Create a mwdebug deployment for mediawiki on kubernetes
Resolved	• dpifke	T288164 Ensure WikimediaDebug "log" and "profile" features work with k8s-mwdebug
Resolved	Joe	T288851 Make logging work for mediawiki in k8s
Resolved	Clement_Goubert	T326794 Ingest php-slowlog in logstash
Resolved	colewhite	T328318 Index orchestrator object fields from ECS 1.11.0 in OpenSearch

Event Timeline

Krinkle created this task.Aug 13 2021, 6:28 PM

lmata edited projects, added SRE Observability; removed observability.Aug 24 2021, 3:12 PM

lmata moved this task from Inbox to Backlog on the SRE Observability board.

Legoktm mentioned this in T289578: Docker container logs (stdout, stderr) can grow quite large.Aug 24 2021, 7:39 PM

fgiunchedi triaged this task as Medium priority.Aug 30 2021, 8:02 AM

Krinkle added a parent task: T288164: Ensure WikimediaDebug "log" and "profile" features work with k8s-mwdebug.Sep 7 2021, 3:07 PM

Joe claimed this task.Sep 14 2021, 12:54 PM

Using fatal-error.php I determined that at the moment logging to mwlog1001 works, while it seems that we're not able to log to logstash. I strongly suspect this is due to some missing egress rules.

Other than that:

We need to add /etc/php/php7-fatal-error.php to the mediawiki image
php stderr currently gets displayed in the response to the request, which is doubly wrong.

@Joe @Krinkle What's the reason php7-fatal-error.php is in /etc/php (via operations/puppet) and not in operations/mediawiki-config ?

@dancy TLDR: It could probably be moved, and I'll ramble a bit about what I currently understand, some of which you know already, and these may or may not be a good reasons for the status quo.

The file is logically executed outside MediaWiki context, referenced from the C code in the native php-wmerrors extension for PHP. It is not invoked or referenced anywhere by "us" at runtime, and may not refer to anything from MediaWiki, multiversion, or wmf-config.

The php-wmerrors file is only for "really bad" errors. The vast majority of errors are sent to syslog by MediaWiki/Monolog, which then go to rsyslog/kafka/logstash. The php-wmerrors file mimics Monolog's syslog messages for edge cases where PHP is unable to let the application report the error, and instead falls back to php-wmerrors.

I think the idea is also that the script should be standalone but yet discover certain settings and services to send information to, which are perhaps more reliable to inject at "build time" through ERB with puppet.

Also touching on the idea that it might be considered an anti-pattern for manifests to provision a server with software and settings that refer to files that aren't ensured by that same manifest. Having said that, we do kind of do this already for Apache which refers to /w/index.php and /w/robots.php and those seem quite natural/required/unavoidable. I suppose the PHP extension feel more standalone and abstractable. It can be removed without affecting MW in any way, perhaps emotionally closer to how we provision Envoy, Mcrouter, and other software external to MW.

Having said that, we technically could move it to wmf-config, and could find an alternative way to discover Statsd. Possibly through an environment variable. Another way might be to consume the *Services.php files, but that would imho muddy the waters and be potentially more risky. php-wmerrors' primary purpose is to be the last part standing in the face of severe errors that couldn't be handled at any other layer, so it "working" in the common case would not be important because in the common case it isn't actually invoked.

The only real reason why we've used puppet there was to inject the statsd address easily IIRC.

@Krinkle we do already include a "params" file, I think we can just keep including it in a special directory in mediawiki-config.

Change 721333 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/docker-images/production-images@master] Add configuration for wmerrors to php-multiversion-base

https://gerrit.wikimedia.org/r/721333

gerritbot added a project: Patch-For-Review.Sep 15 2021, 3:19 PM

Change 721341 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/deployment-charts@master] mediawiki: allow injecting the wmerrors script

https://gerrit.wikimedia.org/r/721341

Change 721342 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/puppet@production] mediawiki::web::yaml_defs: inject php7-0fatal-error.php in k8s

https://gerrit.wikimedia.org/r/721342

The set of patches above should allow us to get wmerrors working; we can work on moving php7-fatal-error.php to mediawiki-config separately.

Change 721333 merged by Giuseppe Lavagetto:

[operations/docker-images/production-images@master] Add configuration for wmerrors to php-multiversion-base

https://gerrit.wikimedia.org/r/721333

Coming to logstash: right now on bare metal we rely the logs to rsyslogd talking to it via TCP on localhost. This is not possible on kubernetes, unless we install a sidecar that acts like a syslog relay to actually relay the logs to the physical node's rsyslogd.

An alternative worth exploring is making MonoLog log to stderr via something like e.g. error_log, but that needs some investigation. From IRC: we currently have customised our Syslog handler quite a bit so we would need to adapt the monolog StreamHandler accordingly.

Joe updated the task description. (Show Details)Sep 16 2021, 2:25 PM

Joe updated the task description. (Show Details)

There is also monolog ErrorLogHandler which might be more idiomatic, but we'll have to see if one has notable benefits over the other in terms of overhead, reliability, or feature-compatibility.

via TCP on localhost.

UDP not TCP, (I am just being pedantic, I know).

Change 721341 merged by Giuseppe Lavagetto:

[operations/deployment-charts@master] mediawiki: allow injecting the wmerrors script

https://gerrit.wikimedia.org/r/721341

Change 721342 merged by Giuseppe Lavagetto:

[operations/puppet@production] mediawiki::web::yaml_defs: inject php7-fatal-error.php in k8s

https://gerrit.wikimedia.org/r/721342

Maintenance_bot removed a project: Patch-For-Review.Sep 20 2021, 7:10 AM

Joe moved this task from Backlog to In Progress on the MW-on-K8s board.Sep 20 2021, 8:13 AM

colewhite subscribed.Sep 23 2021, 3:30 PM

lmata subscribed.Sep 23 2021, 3:30 PM

Change 725005 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/docker-images/production-images@master] Add rsyslog image

https://gerrit.wikimedia.org/r/725005

gerritbot added a project: Patch-For-Review.Sep 30 2021, 1:33 PM

After much deliberation, @akosiaris and I decided we'll try to go the following way:

Install an rsyslogd sidecar that will be used by mediawiki for relaying logs to kafka
Use the same rsyslogd for capturing php-fpm logs and logs generated by php-wmerrors (still unsure if we will just use a unix socket for that or if we might be able to write to udp too
Possibly use this rsyslogd to also funnel the apache logs to a specific kafka topic; although keeping the apache logs to go to stdout and configure rsyslog on the physical node to direct those to that kafka topic is also a possibility

Change 725892 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/deployment-charts@master] mediawiki: Add rsyslog sidecar

https://gerrit.wikimedia.org/r/725892

Change 725005 merged by Giuseppe Lavagetto:

[operations/docker-images/production-images@master] Add rsyslog image

https://gerrit.wikimedia.org/r/725005

Change 725892 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: Add rsyslog sidecar

https://gerrit.wikimedia.org/r/725892

Change 730967 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/puppet@production] kubernetes::deployment_server::mediawiki: add logging configuration

https://gerrit.wikimedia.org/r/730967

Change 730967 merged by Giuseppe Lavagetto:

[operations/puppet@production] kubernetes::deployment_server::mediawiki: add logging configuration

https://gerrit.wikimedia.org/r/730967

Joe updated the task description. (Show Details)Oct 20 2021, 7:16 AM

The php-fpm logs are output to stderr, which goes to logstash at the moment using the physical node rsyslog, but it's under a different set of labels, and to have a simple way to select them, we would need to use a static name for the mediawiki container (which isn't a bad idea overall).

We also have another problem: how to treat and collect php slow logs. Right now I'm sending them to stderr but that gets us a lot of noise and broken logging on logstash.

One possible solution to all of our problems would be:

let php-fpm log to two files, both in the same /var/log/php-fpm directory
Mount that directory as an emptydir volume in both the mediawiki and the rsyslog container
Have rules in rsyslog to process those files separately
Modify php-fatal-error.php for kubernetes to send its data to the udp port 10514, like monolog does

In T288851#7443633, @Joe wrote:

One possible solution to all of our problems would be:

let php-fpm log to two files, both in the same /var/log/php-fpm directory

Mount that directory as an emptydir volume in both the mediawiki and the rsyslog container

Have rules in rsyslog to process those files separately

Modify php-fatal-error.php for kubernetes to send its data to the udp port 10514, like monolog does

I think this is a good idea on how to improve this situation.

We also have another problem: how to treat and collect php slow logs. Right now I'm sending them to stderr but that gets us a lot of noise and broken logging on logstash.

Those are also not structured logs, but rather a mess. Is the approach also meant to give them structure or just to fix the issue with the set of labels?

In T288851#7444291, @akosiaris wrote:

We also have another problem: how to treat and collect php slow logs. Right now I'm sending them to stderr but that gets us a lot of noise and broken logging on logstash.

Those are also not structured logs, but rather a mess. Is the approach also meant to give them structure or just to fix the issue with the set of labels?

I looked for solutions, and at the very least we can collect the full trace in a single log message on logstash using rsyslog rules for their file. We can elaborate from there on how much structure we want to extract from them.

Right now even just getting a full sstack trace in a single log message doesn't happen.

We can probably define a structure for them but I'm not really an rsyslog expert and I'll have to look into how much we can extract.

In T288851#7443633, @Joe wrote:

The php-fpm logs are output to stderr, which goes to logstash at the moment using the physical node rsyslog, but it's under a different set of labels, and to have a simple way to select them, we would need to use a static name for the mediawiki container (which isn't a bad idea overall).

We also have another problem: how to treat and collect php slow logs. Right now I'm sending them to stderr but that gets us a lot of noise and broken logging on logstash.

One possible solution to all of our problems would be:

let php-fpm log to two files, both in the same /var/log/php-fpm directory

Mount that directory as an emptydir volume in both the mediawiki and the rsyslog container

Have rules in rsyslog to process those files separately

Btw, that won't work. emptyDir volume are ephemeral and follow the pod's lifecycle. Plus they aren't addressable by the node in any kind of sane way. You want hostPath and you want to make it unique per pod. Unless rsyslog is the sidecar, not the node's rsyslog.

Modify php-fatal-error.php for kubernetes to send its data to the udp port 10514, like monolog does

In T288851#7444368, @akosiaris wrote:

In T288851#7443633, @Joe wrote:

The php-fpm logs are output to stderr, which goes to logstash at the moment using the physical node rsyslog, but it's under a different set of labels, and to have a simple way to select them, we would need to use a static name for the mediawiki container (which isn't a bad idea overall).

We also have another problem: how to treat and collect php slow logs. Right now I'm sending them to stderr but that gets us a lot of noise and broken logging on logstash.

One possible solution to all of our problems would be:

let php-fpm log to two files, both in the same /var/log/php-fpm directory

Mount that directory as an emptydir volume in both the mediawiki and the rsyslog container

Have rules in rsyslog to process those files separately

Btw, that won't work. emptyDir volume are ephemeral and follow the pod's lifecycle. Plus they aren't addressable by the node in any kind of sane way. You want hostPath and you want to make it unique per pod. Unless rsyslog is the sidecar, not the node's rsyslog.

Yes the idea was to use the rsyslog in the sidecar; both these logs are sparse enough that it shouldn't be an issue with resource starvation of any kind.

In T288851#7444372, @Joe wrote:

In T288851#7444368, @akosiaris wrote:

In T288851#7443633, @Joe wrote:

The php-fpm logs are output to stderr, which goes to logstash at the moment using the physical node rsyslog, but it's under a different set of labels, and to have a simple way to select them, we would need to use a static name for the mediawiki container (which isn't a bad idea overall).

We also have another problem: how to treat and collect php slow logs. Right now I'm sending them to stderr but that gets us a lot of noise and broken logging on logstash.

One possible solution to all of our problems would be:

let php-fpm log to two files, both in the same /var/log/php-fpm directory

Mount that directory as an emptydir volume in both the mediawiki and the rsyslog container

Have rules in rsyslog to process those files separately

Btw, that won't work. emptyDir volume are ephemeral and follow the pod's lifecycle. Plus they aren't addressable by the node in any kind of sane way. You want hostPath and you want to make it unique per pod. Unless rsyslog is the sidecar, not the node's rsyslog.

Yes the idea was to use the rsyslog in the sidecar; both these logs are sparse enough that it shouldn't be an issue with resource starvation of any kind.

OK, +1 on the plan then.

Change 732641 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/docker-images/production-images@master] php-fpm: Allow changing location of the log files

https://gerrit.wikimedia.org/r/732641

Change 732641 merged by Giuseppe Lavagetto:

[operations/docker-images/production-images@master] php-fpm: Allow changing location of the log files

https://gerrit.wikimedia.org/r/732641

I have a (supposedly) working set of normalizing rules to interpret and structure both the php-fpm error log (mostly uninteresting) and the php-fpm slow log (*very* interesting).

I have a few questions about what to do with these parsed logs. I assume we'll ship them to logstash via kafka, but:

What topic should I use on kafka?
Is there a limit on the size of the individual message? some of the traces are quite big
At the moment the stack trace is parsed as an ordered list of trace signals (see P17591) - we would need to aggregate these, but I'm unsure of how they'd be searchable/filtered in logstash. The type of questions we would like to answer is probably something along the lines of "is the same file:line / function is in the first trace of the stack for more than X% of the slow requests?". Any suggestions to improve our output?

I'll shamelessly loop in @colewhite as I suspect he has better answers to the above questions than I do :)

In T288851#7454801, @Joe wrote:

What topic should I use on kafka?

We talked offline a bit. Although I could not find it in the runbooks, I was told slowlog is one of the first things to look at for appserver incident response. Given that, a topic that will be ingested faster is appropriate. We discussed giving it at least an "error" level if they come in via the normal logging rsyslog pipeline (manifesting as rsyslog-error topic). Since this configuration is separated from host-level rsyslog and one of the use cases is kafkacat, it seems preferable to give it its own topic. I recommend something like rsyslog-phpslowlog or something similar since that topics prefixed by rsyslog- will be automatically picked up by Logstash.

Is there a limit on the size of the individual message? some of the traces are quite big

I was told they could reach 1MB in size. This is well below the maximum for ES, but there may be other size limitations in upstream components. We'll want to watch for instances of dropped slowlogs.

At the moment the stack trace is parsed as an ordered list of trace signals (see P17591) - we would need to aggregate these, but I'm unsure of how they'd be searchable/filtered in logstash. The type of questions we would like to answer is probably something along the lines of "is the same file:line / function is in the first trace of the stack for more than X% of the slow requests?". Any suggestions to improve our output?

There are a few options and it's not clear which is the best.
I'll limit recommendations to the well-defined ECS schema. The legacy logstash index pattern could do options 1 and 3, but there would be no documentation.

Stack trace as a single, newline-delimited blob of Text and Keyword. This is the default "recommended" place in ECS (see error.stack_trace) which stores the whole text field for view but only indexes the first 1024 bytes. This limits our ability to query/filter it to the first 1024 bytes and does not help answer "what is the most popular first trace file:line/function for this period of time?". We could work around this by creating another field that extracts the first trace file:line/function out (excluding address) into a Keyword which would support aggregation.
Stack trace as array of Object<Keyword>. There are no fields that support this in ECS right now, but we could add one that uses Nested type. It still doesn't answer "what is the most popular first trace file:line/function for this period of time?" but it does enable us to query the entire stack trace. Downside is by separating the file, line, and function fields, we cannot re-assemble them at query time for determining the most popular first trace. We could work around this by creating another field that has this data assembled in a Keyword according to a format (e.g. <file>:<line> <function>). The format would be documented.
Stack trace as array of Keywords. There are no fields that support this form in ECS at this time, but we could add one of Keyword type and the address field would need to be dropped. It still doesn't answer "what is the most popular first trace file:line/function for this period of time?" but it does enable us to query the entire stack trace. We'll still need to copy out the first trace file:line/function to its own field to answer the question. Downside is we'll have to decide on a format, e.g. <file>:<line> <function>. The format would be documented.

If we need to be able to query/filter the entire stack trace, then option 3 is probably the right solution. This would require dropping the address field.
If we need to maintain the address field, then option 2.
If we don't need to query/filter the whole stack trace, then option 1.

As we make these decisions, I'd love if we could keep T291645: Integrate Event Platform and ECS logs in mind.

What topic should I use on kafka?

I support a separate topic too! Can we do something that would be consistent with other mediawiki stream names? We can declare streams that are composed of any topic(s)in EventStreamConfig, but the convention is to keep the names similar. E.g. mediawiki.client.error stream is made up of topics eqiad.mediawiki.client.error and codfw.mediawiki.client.error. Could we call this something like mediawiki.log or mediawiki.slowlog or some variation?

Stack trace as a single, newline-delimited blob of Text and Keyword.

This is what the mediawiki/client/error schema does.

I recall a lot of back and forth on how to represent the stack trace in that schema, but I can't find the context atm. @jlinehan might be able to provide more?

In T288851#7456964, @Ottomata wrote:

As we make these decisions, I'd love if we could keep T291645: Integrate Event Platform and ECS logs in mind.

What topic should I use on kafka?

I support a separate topic too! Can we do something that would be consistent with other mediawiki stream names? We can declare streams that are composed of any topic(s)in EventStreamConfig, but the convention is to keep the names similar. E.g. mediawiki.client.error stream is made up of topics eqiad.mediawiki.client.error and codfw.mediawiki.client.error. Could we call this something like mediawiki.log or mediawiki.slowlog or some variation?

Just to clarify, this is not a Mediawiki-generated trace, but rather something we obtain from php-fpm. I was thinking of using something like php-fpm.mediawiki.slowlog. Also, this is the logging kafka we're talking about.

Stack trace as a single, newline-delimited blob of Text and Keyword.

This is what the mediawiki/client/error schema does.

I recall a lot of back and forth on how to represent the stack trace in that schema, but I can't find the context atm. @jlinehan might be able to provide more?

Ok this would be interesting to keep for consistency. But I'm more interested into extracting the head of the stack for pattern observations, and I'm not sure how to do both.

@colewhite my logs will (primarily) come from kubernetes; I don't see any kubernetes.* in the ECS docs, but I do need to add those tags like we do for all logs coming from kubernetes; will the resulting log message still be possibly ecs-compliant?

Anyways, I found a way to extract the first element from the stacktrace with the parsing I've done up to now, I'll send a patch your way and we can work starting from there.

Just to clarify, this is not a Mediawiki-generated trace, but rather something we obtain from php-fpm. I was thinking of using something like php-fpm.mediawiki.slowlog

Ah got it. Sounds good to me!

Ottomata mentioned this in T291645: Integrate Event Platform and ECS logs.Oct 26 2021, 4:02 PM

Change 734692 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/deployment-charts@master] mediawiki: add handling of php-fpm logs via rsyslogd

https://gerrit.wikimedia.org/r/734692

The solution I adopted for the stacktrace is to register the head file:line:function triplet in error.message so it's easy to aggregate over it, and also the full stacktrace json-encoded in error.stack_trace.

This is an example message

{
  "timestamp": "2021-10-26T15:42:19.534886+00:00",
  "host.name": "8f89b3ab3d47",
  "log.syslog.severity.code": "5",
  "log.syslog.severity.name": "notice",
  "log.level": "notice",
  "log.syslog.facility.code": "16",
  "log.syslog.facility.name": "local0",
  "log.syslog.priority": "133",
  "service.type": "php-fpm-slowlog",
  "ecs.version": "1.7.0",
  "kubernetes": {
    "host": "pu",
    "namespace_name": "pinkunicorn",
    "pod_name": "some-pod",
    "labels": {
      "deployment": "some-deployment",
      "release": "some-release"
    }
  },
  "error": {
    "message": "/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/PPFrame_Hash.php:268:splitRawTemplate()",
    "stack_trace": "[ { \"line\": \"268\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/PPFrame_Hash.php\", \"function\": \"splitRawTemplate()\", \"address\": \"0x00007f3cb2220eb0\" }, { \"line\": \"3272\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/Parser.php\", \"function\": \"expand()\", \"address\": \"0x00007f3cb2220cd0\" }, { \"line\": \"276\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/PPFrame_Hash.php\", \"function\": \"braceSubstitution()\", \"address\": \"0x00007f3cb22209b0\" }, { \"line\": \"2917\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/Parser.php\", \"function\": \"expand()\", \"address\": \"0x00007f3cb22207d0\" }, { \"line\": \"1584\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/Parser.php\", \"function\": \"replaceVariables()\", \"address\": \"0x00007f3cb2220700\" }, { \"line\": \"856\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/Parser.php\", \"function\": \"internalParse()\", \"address\": \"0x00007f3cb2220640\" }, { \"line\": \"73\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/extensions/Cite/src/ReferencesFormatter.php\", \"function\": \"recursiveTagParse()\", \"address\": \"0x00007f3cb22205c0\" }, { \"line\": \"495\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/extensions/Cite/src/Cite.php\", \"function\": \"formatReferences()\", \"address\": \"0x00007f3cb2220470\" }, { \"line\": \"467\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/extensions/Cite/src/Cite.php\", \"function\": \"formatReferences()\", \"address\": \"0x00007f3cb22203c0\" }, { \"line\": \"414\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/extensions/Cite/src/Cite.php\", \"function\": \"guardedReferences()\", \"address\": \"0x00007f3cb22202b0\" }, { \"line\": \"72\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/extensions/Cite/src/Hooks/CiteParserTagHooks.php\", \"function\": \"references()\", \"address\": \"0x00007f3cb2220200\" }, { \"line\": \"3964\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/Parser.php\", \"function\": \"references()\", \"address\": \"0x00007f3cb22200d0\" }, { \"line\": \"1163\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/CoreParserFunctions.php\", \"function\": \"extensionSubstitution()\", \"address\": \"0x00007f3cb221ff50\" }, { \"line\": \"3398\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/Parser.php\", \"function\": \"tagObj()\", \"address\": \"0x00007f3cb221fda0\" }, { \"line\": \"3081\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/Parser.php\", \"function\": \"callParserFunction()\", \"address\": \"0x00007f3cb221fc50\" }, { \"line\": \"276\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/PPFrame_Hash.php\", \"function\": \"braceSubstitution()\", \"address\": \"0x00007f3cb221f930\" }, { \"line\": \"3272\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/Parser.php\", \"function\": \"expand()\", \"address\": \"0x00007f3cb221f750\" }, { \"line\": \"276\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/PPFrame_Hash.php\", \"function\": \"braceSubstitution()\", \"address\": \"0x00007f3cb221f430\" }, { \"line\": \"2917\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/Parser.php\", \"function\": \"expand()\", \"address\": \"0x00007f3cb221f250\" }, { \"line\": \"1584\", \"file\": \"/srv/mediawiki/php-1.38.0-wmf.5/includes/parser/Parser.php\", \"function\": \"replaceVariables()\", \"address\": \"0x00007f3cb221f180\" } ]"
  },
  "process": {
    "pid": "22985"
  },
  "file": {
    "file.path": "/srv/mediawiki/docroot/wikipedia.org/w/index.php"
  }
}

Change 734698 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/software/ecs@master] add stack.head field for aggregating events by stack head

https://gerrit.wikimedia.org/r/734698

In T288851#7458556, @Joe wrote:

@colewhite my logs will (primarily) come from kubernetes; I don't see any kubernetes.* in the ECS docs, but I do need to add those tags like we do for all logs coming from kubernetes; will the resulting log message still be possibly ecs-compliant?

You are right, there are no kubernetes fields at the moment, only container fields which are quite limited. We'll want those. Feedback about how best to organize and document these fields is very much welcome. T292881

Joe mentioned this in T292881: Mutate mmkubernetes k8s fields into ECS fields.Oct 28 2021, 1:38 PM

Joe mentioned this in T294581: Upgrade ECS to 1.11.0.Nov 2 2021, 6:45 AM

Change 739520 had a related patch set uploaded (by Giuseppe Lavagetto; author: Giuseppe Lavagetto):

[operations/puppet@production] profile::mediawiki::php: support kubernetes in php-fatal-error.php

https://gerrit.wikimedia.org/r/739520

Change 734698 merged by jenkins-bot:

[operations/software/ecs@master] add stack.head field for aggregating events by stack head

https://gerrit.wikimedia.org/r/734698

colewhite mentioned this in rOSECdcf8a1e89cd6: add stack.head field for aggregating events by stack head.Nov 20 2021, 2:01 AM

Change 739520 merged by Giuseppe Lavagetto:

[operations/puppet@production] profile::mediawiki::php: support kubernetes in php-fatal-error.php

https://gerrit.wikimedia.org/r/739520

Joe updated the task description. (Show Details)Nov 24 2021, 11:25 AM

After deploying the changes to php-fatal-error.php, we can now see the error messages delivered by php-wmerrors in logstash.

Change 734692 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: add handling of php-fpm logs via rsyslogd

https://gerrit.wikimedia.org/r/734692

Joe updated the task description. (Show Details)Dec 20 2021, 8:00 AM

lmata moved this task from Backlog to Radar on the SRE Observability board.Jan 17 2022, 4:19 PM

In T288164#7742387, @Krinkle wrote:

For the record, the logs from k8s-mwdebug pods do show up in Logstash but not on the mwdebug Logstash dashboard (since the hostname isn't mwdebug*). But they do show up on the general mediawiki dashboard when querying for host:*pinkunicorn*.

jijiki moved this task from Incoming 🐫 to 🙈🙉🙊Backlog on the serviceops board.Sep 28 2022, 2:20 PM

In T288851#7742391, @Krinkle wrote:

In T288164#7742387, @Krinkle wrote:

For the record, the logs from k8s-mwdebug pods do show up in Logstash but not on the mwdebug Logstash dashboard (since the hostname isn't mwdebug*). But they do show up on the general mediawiki dashboard when querying for host:*pinkunicorn*.

FYI, I just changed the mwdebug logstash dashboard to also take into account logs with servergroup: kube-mw-debug, which brings the mw-debug logs into it.

Clement_Goubert moved this task from 🙈🙉🙊Backlog to Doing 😎 on the serviceops board.Jan 12 2023, 11:19 AM

Clement_Goubert changed the status of subtask T326794: Ingest php-slowlog in logstash from Open to In Progress.Jan 12 2023, 11:24 AM

colewhite closed subtask T328318: Index orchestrator object fields from ECS 1.11.0 in OpenSearch as Resolved.Jan 30 2023, 3:34 PM

Maintenance_bot removed a project: Patch-For-Review.Jan 30 2023, 4:31 PM

Clement_Goubert closed subtask T326794: Ingest php-slowlog in logstash as Resolved.Feb 17 2023, 11:17 AM

Clement_Goubert moved this task from In Progress to Done on the MW-on-K8s board.Mar 6 2023, 9:29 AM

Clement_Goubert closed this task as Resolved.Apr 3 2023, 10:20 AM

Krinkle mentioned this in T343390: MW-on-k8s traffic logs fewer errors than expected (increase in jsonTruncated) .Aug 2 2023, 10:28 PM