Page MenuHomePhabricator

Stream a subset of mediawiki apache logs to logstash
Closed, ResolvedPublic

Description

We will experimentally stream mediawiki apache logs from 1 api and 1 app server and evaluate if the information provided is useful during latency spikes

Final URL: https://logstash.wikimedia.org/app/kibana#/dashboard/AXCHeGvOKWrIH1QRJXNu

Event Timeline

jijiki added a parent task: Restricted Task.Feb 6 2020, 11:44 AM

Change 571239 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] WIP mediawiki: send apache logs to logstash

https://gerrit.wikimedia.org/r/571239

I have uploaded a patch that could possibly work, my issue generally is that I can't find a sane and safe way to test if those logstash filters will do what we need. @herron any ideas are welcome

Hey @jijiki, usually to test/validate filters like this I'll cherry pick or live-hack the logstash config on the beta cluster and generate the desired traffic there to see how logstash behaves. There are some details at https://wikitech.wikimedia.org/wiki/Logstash#Beta_Cluster_Logstash

Change 572057 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] WIP hieradata: test streaming apache logs to logstash from mwdebug1001

https://gerrit.wikimedia.org/r/572057

@herron I fiddled a bit on beta, it appears that for some reason, nothing is being streamed there since today, I am not sure if I broke it myself while I was trying to make it work :/ FWIW, I have restored the original config on deployment-logstash03

Fwiw I do see logs flowing into logstash-beta generally, but puppet was broken in the beta cluster because the master filled its disk. Puppet master on deployment-puppetmaster04.deployment-prep.eqiad.wmflabs seems to be logging at debug level, making puppet runs super slow and rapidly filling the disk. I don't have time at the moment, but if still broken in the morning I'll take a closer look.

Learned today that T243226 is tracking the current beta cluster puppetmaster issues

Oh great, thanks! From https://logstash-beta.wmflabs.org/, it looks that logs are flowing, so maybe we can try the change there after all

I've cherry picked https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/571239/ on deployment-puppetmaster04.deployment-prep.eqiad.wmflabs (and made a minor change in patchset 9, since logstash was complaining about the quotes). The config loads ok in logstash.

Also I set profile::mediawiki::webserver::stream_to_logstash: true in the deployment-prep puppet project hiera via horizon.

Now when running puppet on e.g. deployment-mediawiki09.deployment-prep.eqiad.wmflabs puppet is throwing:

deployment-mediawiki-09:~$ sudo puppet agent -t
Info: Using configured environment 'production'
Info: Retrieving pluginfacts
Info: Retrieving plugin
Info: Loading facts
Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Method call, 'split' parameter 'str' expects a String value, got Undef (file: /etc/puppet/modules/profile/manifests/mediawiki/webserver.pp, line: 143, column: 38) on node deployment-mediawiki-09.deployment-prep.eqiad.wmflabs
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

It appears that on beta the variable $server_role = $::_role.split('/')[-1 is not evaluated properly, while in production, it looks just fine https://puppet-compiler.wmflabs.org/compiler1001/20912/mwdebug1001.eqiad.wmnet/, I believe this has something to do with this server's role in beta?

@herron any ideas how to proceed here? Is there someone who can help? Apparently this patch could potentially break beta.

Looking a bit closer I think this is happening because the nodes in labs are assigned their roles/profiles/etc via the external node classifier in horizon, which isn't making the call to role() as we do in prod and so $::_role isn't set in the process.

For a workaround, one idea is to wrap this in something like if defined('$::_roles') and setting an else case that will configure a generic prefixless "mediawiki-access-log"? I only see 4 hits with git grep $::_role, so not finding much to pull from in terms of precedent. Maybe worth a convo/sanity check within serviceops?

I have uploaded a patch which I manually tried on beta, this seems to work, but sadly, puppet breaks a bit further down the road

With a little bit more fiddling, I managed to run puppet on ssh deployment-mediawiki-09.deployment-prep.eqiad.wmflabs! @herron does that unblock you?

After a lot of fiddling with @herron, we are finally at this https://phabricator.wikimedia.org/P10513 !

The resource field needs a bit more attention, but I think we can go on to try in production.

Change 571239 merged by Herron:
[operations/puppet@production] mediawiki: stream apache logs to logstash

https://gerrit.wikimedia.org/r/571239

Change 572057 merged by Effie Mouzeli:
[operations/puppet@production] hieradata: test streaming apache logs to logstash from mwdebug1001

https://gerrit.wikimedia.org/r/572057

Change 575000 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] logstash, mediawiki: minor fixes in log streaming

https://gerrit.wikimedia.org/r/575000

Change 575000 merged by Herron:
[operations/puppet@production] logstash, mediawiki: minor fixes in log streaming

https://gerrit.wikimedia.org/r/575000

Change 575329 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] hieradata: send mw1262's apache logs to logstash

https://gerrit.wikimedia.org/r/575329

Change 575329 merged by Effie Mouzeli:
[operations/puppet@production] hieradata: send mw1262's apache logs to logstash

https://gerrit.wikimedia.org/r/575329

Change 575474 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] hieradata: send mw1276's apache logs to logstash

https://gerrit.wikimedia.org/r/575474

Change 575474 merged by Effie Mouzeli:
[operations/puppet@production] hieradata: send mw1276's apache logs to logstash

https://gerrit.wikimedia.org/r/575474

jijiki updated the task description. (Show Details)

Thank you @herron and @fgiunchedi for your help!

Change 587289 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] mediawiki: Document the apache sample hosts

https://gerrit.wikimedia.org/r/587289

Change 587289 merged by Dzahn:
[operations/puppet@production] mediawiki: Document the apache sample hosts

https://gerrit.wikimedia.org/r/587289

Just a note the Apache logs are still emitted to logstash for mw1262 and mw1276

hieradata/hosts/mw1262.yaml
profile::mediawiki::webserver::stream_to_logstash: true

If that is no more needed, maybe it should be removed? Else just disregard :]