[toolforge.infra] Provide centralized logging (logstash) for Toolforge
Open, MediumPublic
Actions

Assigned To

None

Authored By

	valhallasw
	May 1 2015, 9:59 PM

Description

One log to rule them all.

It would be good to have logstash for at least tools-ops logs, which includes

basic system logging (dmesg / syslog)
all the mails that now end up in my inbox ;-)
infra logging (puppet, apt, diamond)
mail (exim on tools-mailrelay)
SGE (what kind of logs do we have there?)
bigbrother actions
Redis?
ssh (also useful to help people with issues logging in)

Most importantly in practice would be the logs that relate to warnings from shinken-wm, which includes:

puppet staleness/failures
ssh

and the issues we get mails about, which are:

sge
apt
raid??
exim paniclog

Related Objects
Search...

Status	Assigned	Task
Resolved	• Bstorm	T126083 overhaul labstore setup [tracking]
Resolved	• GTirloni	T216988 labstore1004 - DISK CRITICAL - free space: /srv/tools 115904 MB (1% inode=79%):
Resolved	• Bstorm	T217993 2019-03-10: tools and NFS share cleanup (high usage)
Resolved	• Bstorm	T122508 Prevent overly-large log files
Resolved	None	T90534 Make toolforge reliable enough (tracking)
Open	None	T97861 [toolforge.infra] Provide centralized logging (logstash) for Toolforge
Resolved	dcaro	T152235 Simple logrotate service for users of Tools as stopgap before central logging

Event Timeline

valhallasw created this task.May 1 2015, 9:59 PM

valhallasw raised the priority of this task from to Medium.

valhallasw updated the task description. (Show Details)

valhallasw added a project: Toolforge.

valhallasw added subscribers: coren, scfc, Aklapper, yuvipanda.

10:09 valhallasw`cloud: created toolsbeta-logstash to play around with logstash and figure out what we need for tools (phab:T97861)
10:25 valhallasw`cloud: set Hiera variable "elasticsearch::cluster_name": toolsbeta-logstash-eqiad
10:30 valhallasw`cloud: pulled new changes into puppetmaster to get https://github.com/wikimedia/operations-puppet/commit/4afd23d8e2905a84ef211ad92e8314173eb743ba in
10:37 valhallasw`cloud: that doesn't seem to be applied... setting has_ganglia: false manually in wikitech hiera
11:11 valhallasw`cloud: commenting out include ::elasticsearch::ganglia in role::logstash seems to work. I think we have to write our own tools logstash roles anyway in the end, as the role::logstash code contains e.g. mediawiki specific code

Unfortunately, logstash doesn't actually start and crashes with

Errno::EBADF: Bad file descriptor - Bad file descriptor
          close at org/jruby/RubyIO.java:2097
        connect at /opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:173
           each at org/jruby/RubyArray.java:1613
        connect at /opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/connection.rb:139
        connect at /opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:406
           call at org/jruby/RubyProc.java:271
          fetch at /opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/pool.rb:48
        connect at /opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:403
        execute at /opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:319
           get! at /opt/logstash/vendor/bundle/jruby/1.9/gems/ftw-0.0.39/lib/ftw/agent.rb:217
       register at /opt/logstash/lib/logstash/outputs/elasticsearch_http.rb:117
           each at org/jruby/RubyArray.java:1613
   outputworker at /opt/logstash/lib/logstash/pipeline.rb:220
  start_outputs at /opt/logstash/lib/logstash/pipeline.rb:152

Which was because elasticsearch wasn't started. OK, that starts logstash, but that doesn't actually give us an interface yet...

valhallasw added a subscriber: • bd808.May 8 2015, 7:41 PM

valhallasw moved this task from Backlog to Ready to be worked on on the Toolforge board.May 10 2015, 8:43 PM

intracer subscribed.Oct 24 2015, 4:56 PM

Restricted Application added a project: Cloud-Services. · View Herald TranscriptOct 24 2015, 4:56 PM

tom29739 subscribed.Apr 4 2016, 3:38 PM

valhallasw mentioned this in T127367: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge webservices and bots.May 27 2016, 4:36 PM

• Phabricator_maintenance removed a subscriber: yuvipanda.Jun 7 2017, 6:51 PM

Framawiki subscribed.Jun 7 2018, 7:04 PM

herron added a parent task: T198756: Audit log producers across the infrastructure and plan their transition to centralized logging..Jul 5 2018, 5:14 PM

Unlinking from T198756 as Toolforge is out of scope for the current goals, though the design/implementation can be equally applied to Toolforge as well.

JeanFred subscribed.Aug 29 2019, 8:36 AM

JeanFred awarded a token.Aug 29 2019, 10:09 AM

• Bstorm added a project: cloud-services-team (Kanban).Mar 10 2020, 6:45 PM

Random note that, at this point, one of the only multitenant solutions for this kind of thing that is open source seems to be https://grafana.com/docs/loki/latest/overview/

• Bstorm added a subtask: T152235: Simple logrotate service for users of Tools as stopgap before central logging.Oct 16 2020, 6:07 PM

• Bstorm added a parent task: T122508: Prevent overly-large log files.Oct 16 2020, 6:10 PM

• Bstorm closed this task as a duplicate of T127367: [toolforge,jobs-api,webservice,storage] Provide modern, non-NFS log solution for Toolforge webservices and bots.

Re-opening this as it isn't really a duplicate. Instead both this and the other one should be under another task.

cc observability for visibility

Hello, Is there something for us (o11y) here or should we just stay in the loop for potential collaboration? Subscribing and radar for now.

lmata moved this task from Inbox to Radar on the observability board.Oct 19 2020, 3:13 PM

• nskaggs subscribed.Oct 21 2020, 9:26 PM

This is something to discuss and potentially collaborate on. I'll follow-up with you.

fnegri edited projects, added cloud-services-team; removed cloud-services-team (Kanban).Jan 18 2023, 7:32 PM

fnegri moved this task from Kanban to Inbox on the cloud-services-team board.

Don-vip awarded a token.Jan 7 2024, 12:32 PM

Don-vip subscribed.

dcaro moved this task from Ready to be worked on to Workspace for triaging whenever needed on the Toolforge board.Feb 21 2024, 10:17 AM

dcaro renamed this task from Provide centralized logging (logstash) for Toolforge to [toolforge.infra] Provide centralized logging (logstash) for Toolforge.Feb 21 2024, 10:20 AM

dcaro reopened this task as Open.

dcaro closed this task as a duplicate of T141500: [toolforge.infra] Setup centralized logging for the infra (ELK possibly).

dcaro merged a task: T141500: [toolforge.infra] Setup centralized logging for the infra (ELK possibly).

dcaro added subscribers: yuvipanda, EBernhardson.

dcaro moved this task from Workspace for triaging whenever needed to Ready to be worked on on the Toolforge board.Feb 21 2024, 4:02 PM

dcaro closed subtask T152235: Simple logrotate service for users of Tools as stopgap before central logging as Resolved.Jul 11 2024, 2:57 PM

Count_Count subscribed.Sep 5 2024, 10:37 AM