Page MenuHomePhabricator

Provide centralized logging (logstash) for Toolforge
Open, MediumPublic

Description

One log to rule them all.

It would be good to have logstash for at least tools-ops logs, which includes

  • basic system logging (dmesg / syslog)
  • all the mails that now end up in my inbox ;-)
  • infra logging (puppet, apt, diamond)
  • mail (exim on tools-mailrelay)
  • SGE (what kind of logs do we have there?)
  • bigbrother actions
  • Redis?
  • ssh (also useful to help people with issues logging in)

Most importantly in practice would be the logs that relate to warnings from shinken-wm, which includes:

  • puppet staleness/failures
  • ssh

and the issues we get mails about, which are:

  • sge
  • apt
  • raid??
  • exim paniclog

Event Timeline

valhallasw raised the priority of this task from to Medium.
valhallasw updated the task description. (Show Details)
valhallasw added a project: Toolforge.
valhallasw added subscribers: coren, scfc, Aklapper, yuvipanda.

10:09 valhallasw`cloud: created toolsbeta-logstash to play around with logstash and figure out what we need for tools (phab:T97861)
10:25 valhallasw`cloud: set Hiera variable "elasticsearch::cluster_name": toolsbeta-logstash-eqiad
10:30 valhallasw`cloud: pulled new changes into puppetmaster to get https://github.com/wikimedia/operations-puppet/commit/4afd23d8e2905a84ef211ad92e8314173eb743ba in
10:37 valhallasw`cloud: that doesn't seem to be applied... setting has_ganglia: false manually in wikitech hiera
11:11 valhallasw`cloud: commenting out include ::elasticsearch::ganglia in role::logstash seems to work. I think we have to write our own tools logstash roles anyway in the end, as the role::logstash code contains e.g. mediawiki specific code

Unfortunately, logstash doesn't actually start and crashes with

Errno::EBADF: Bad file descriptor - Bad file descriptor
          close at org/jruby/RubyIO.java:2097
        connect at /opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:173
           each at org/jruby/RubyArray.java:1613
        connect at /opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:139
        connect at /opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:406
           call at org/jruby/RubyProc.java:271
          fetch at /opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/pool.rb:48
        connect at /opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:403
        execute at /opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:319
           get! at /opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:217
       register at /opt/logstash/lib/logstash/outputs/elasticsearch_http.rb:117
           each at org/jruby/RubyArray.java:1613
   outputworker at /opt/logstash/lib/logstash/pipeline.rb:220
  start_outputs at /opt/logstash/lib/logstash/pipeline.rb:152

Which was because elasticsearch wasn't started. OK, that starts logstash, but that doesn't actually give us an interface yet...

valhallasw moved this task from Triage to Backlog on the Toolforge board.May 10 2015, 8:43 PM
Restricted Application added a project: Cloud-Services. · View Herald TranscriptOct 24 2015, 4:56 PM
fgiunchedi renamed this task from Provide centralized logging (logstash) to Provide centralized logging (logstash) for Toolforge.Oct 1 2018, 1:13 PM
fgiunchedi removed a project: Cloud-Services.
fgiunchedi added a subscriber: fgiunchedi.

Unlinking from T198756 as Toolforge is out of scope for the current goals, though the design/implementation can be equally applied to Toolforge as well.