We will experimentally stream mediawiki apache logs from 1 api and 1 app server and evaluate if the information provided is useful during latency spikes
Final URL: https://logstash.wikimedia.org/app/kibana#/dashboard/AXCHeGvOKWrIH1QRJXNu
We will experimentally stream mediawiki apache logs from 1 api and 1 app server and evaluate if the information provided is useful during latency spikes
Final URL: https://logstash.wikimedia.org/app/kibana#/dashboard/AXCHeGvOKWrIH1QRJXNu
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Restricted Task | |||||
Resolved | jijiki | T244472 Stream a subset of mediawiki apache logs to logstash |
Change 571239 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] WIP mediawiki: send apache logs to logstash
I have uploaded a patch that could possibly work, my issue generally is that I can't find a sane and safe way to test if those logstash filters will do what we need. @herron any ideas are welcome
Hey @jijiki, usually to test/validate filters like this I'll cherry pick or live-hack the logstash config on the beta cluster and generate the desired traffic there to see how logstash behaves. There are some details at https://wikitech.wikimedia.org/wiki/Logstash#Beta_Cluster_Logstash
Change 572057 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] WIP hieradata: test streaming apache logs to logstash from mwdebug1001
@herron I fiddled a bit on beta, it appears that for some reason, nothing is being streamed there since today, I am not sure if I broke it myself while I was trying to make it work :/ FWIW, I have restored the original config on deployment-logstash03
Fwiw I do see logs flowing into logstash-beta generally, but puppet was broken in the beta cluster because the master filled its disk. Puppet master on deployment-puppetmaster04.deployment-prep.eqiad.wmflabs seems to be logging at debug level, making puppet runs super slow and rapidly filling the disk. I don't have time at the moment, but if still broken in the morning I'll take a closer look.
Oh great, thanks! From https://logstash-beta.wmflabs.org/, it looks that logs are flowing, so maybe we can try the change there after all
I've cherry picked https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/571239/ on deployment-puppetmaster04.deployment-prep.eqiad.wmflabs (and made a minor change in patchset 9, since logstash was complaining about the quotes). The config loads ok in logstash.
Also I set profile::mediawiki::webserver::stream_to_logstash: true in the deployment-prep puppet project hiera via horizon.
Now when running puppet on e.g. deployment-mediawiki09.deployment-prep.eqiad.wmflabs puppet is throwing:
deployment-mediawiki-09:~$ sudo puppet agent -t Info: Using configured environment 'production' Info: Retrieving pluginfacts Info: Retrieving plugin Info: Loading facts Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Method call, 'split' parameter 'str' expects a String value, got Undef (file: /etc/puppet/modules/profile/manifests/mediawiki/webserver.pp, line: 143, column: 38) on node deployment-mediawiki-09.deployment-prep.eqiad.wmflabs Warning: Not using cache on failed catalog Error: Could not retrieve catalog; skipping run
It appears that on beta the variable $server_role = $::_role.split('/')[-1 is not evaluated properly, while in production, it looks just fine https://puppet-compiler.wmflabs.org/compiler1001/20912/mwdebug1001.eqiad.wmnet/, I believe this has something to do with this server's role in beta?
@herron any ideas how to proceed here? Is there someone who can help? Apparently this patch could potentially break beta.
Looking a bit closer I think this is happening because the nodes in labs are assigned their roles/profiles/etc via the external node classifier in horizon, which isn't making the call to role() as we do in prod and so $::_role isn't set in the process.
For a workaround, one idea is to wrap this in something like if defined('$::_roles') and setting an else case that will configure a generic prefixless "mediawiki-access-log"? I only see 4 hits with git grep $::_role, so not finding much to pull from in terms of precedent. Maybe worth a convo/sanity check within serviceops?
I have uploaded a patch which I manually tried on beta, this seems to work, but sadly, puppet breaks a bit further down the road
With a little bit more fiddling, I managed to run puppet on ssh deployment-mediawiki-09.deployment-prep.eqiad.wmflabs! @herron does that unblock you?
After a lot of fiddling with @herron, we are finally at this https://phabricator.wikimedia.org/P10513 !
The resource field needs a bit more attention, but I think we can go on to try in production.
Change 571239 merged by Herron:
[operations/puppet@production] mediawiki: stream apache logs to logstash
Change 572057 merged by Effie Mouzeli:
[operations/puppet@production] hieradata: test streaming apache logs to logstash from mwdebug1001
Change 575000 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] logstash, mediawiki: minor fixes in log streaming
Change 575000 merged by Herron:
[operations/puppet@production] logstash, mediawiki: minor fixes in log streaming
Change 575329 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] hieradata: send mw1262's apache logs to logstash
Change 575329 merged by Effie Mouzeli:
[operations/puppet@production] hieradata: send mw1262's apache logs to logstash
Change 575474 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] hieradata: send mw1276's apache logs to logstash
Change 575474 merged by Effie Mouzeli:
[operations/puppet@production] hieradata: send mw1276's apache logs to logstash
Change 587289 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/puppet@production] mediawiki: Document the apache sample hosts
Change 587289 merged by Dzahn:
[operations/puppet@production] mediawiki: Document the apache sample hosts
Just a note the Apache logs are still emitted to logstash for mw1262 and mw1276
profile::mediawiki::webserver::stream_to_logstash: true
If that is no more needed, maybe it should be removed? Else just disregard :]