Page MenuHomePhabricator

give releng access to logs to debug buildkit-to-wmf-registry publishing
Closed, ResolvedPublic

Description

This ticket is to discuss and possibly implement access for releng group shell users to any access logs that are considered helpful to debug buildkit-to-wmf-registry publishing problems.

re this IRC comment:


< dduvall> ..debug a buildkit-to-wmf-registry publishing problem...

< dduvall> long term, it would be great if we could figure out a level of access for releng folks to see the access logs because i imagine we might be debugging publishing issues once in a while once we start migrating projects

< dduvall> ah i just saw the comment here https://phabricator.wikimedia.org/T322453#8375676 maybe it's just about simply upgrading. however, it's probably still a good idea to see what's happening server side which i cannot

Let's go from here and define which hosts, which logs etc and then see if we want to do that via admin groups or logstash?

We have another open ticket about getting gitlab logs into logstash.

Event Timeline

Dzahn updated the task description. (Show Details)
Dzahn added a subscriber: dduvall.

Thanks for filing this!

This is what would be helpful for us in debugging registry access issues.

  • nginx access logs and error logs on the registry hosts (registry{1003,1004}.eqiad.wmnet, registry{2003,2004}.codfw.wmnet)
  • docker-registry service logs and access logs on the registry hosts
  • jwt-authorizer systemd logs on the registry hosts

@dancy, @jnuche ^ does this seem exhaustive enough for us?

I can imagine a scenario where there might be issues with the swift backend, but I don't think anything at that level wouldn't be surfaced by the other logs somehow and probably noticed and fixed by SRE very quickly.

Thanks for filing this!

This is what would be helpful for us in debugging registry access issues.

  • nginx access logs and error logs on the registry hosts (registry{1003,1004}.eqiad.wmnet, registry{2003,2004}.codfw.wmnet)
  • docker-registry service logs and access logs on the registry hosts
  • jwt-authorizer systemd logs on the registry hosts

@dancy, @jnuche ^ does this seem exhaustive enough for us?

I can imagine a scenario where there might be issues with the swift backend, but I don't think anything at that level wouldn't be surfaced by the other logs somehow and probably noticed and fixed by SRE very quickly.

I would be ok with granting access to logs via logstash, I'm quite opposed to widen access to the registry hosts, it seems not really useful.

Also: both the registry and nginx keep access logs, so I guess it's enough to export one of the two.

[...]
Also: both the registry and nginx keep access logs, so I guess it's enough to export one of the two.

I'd say export those from docker-registry. It writes access log plus additional stuff that might be useful

I just added this small section to the Wikitech Logstash page how I got logs from "misc" systems into logstash with 2 changes.

One to get them to local syslog via rsyslog and then another to tell rsyslog to ship to kafka. And then I could see them in logstash filtering by "program". Example patches linked below are what I would copy to get nginx logs there.

https://wikitech.wikimedia.org/wiki/Logstash#Getting_logs_from_misc_systems_into_logstash

[...]
Also: both the registry and nginx keep access logs, so I guess it's enough to export one of the two.

I'd say export those from docker-registry. It writes access log plus additional stuff that might be useful

There will be nginx access log entries that are not reflected in the docker-registry access logs that would be useful to us as well, most notably anything related to auth using the GitLab JWT tokens, subrequests between nginx and jwt-authorizer and the related nginx responses to the client. These do not hit docker-registry at all.

I would be ok with granting access to logs via logstash, I'm quite opposed to widen access to the registry hosts, it seems not really useful.

Also: both the registry and nginx keep access logs, so I guess it's enough to export one of the two.

FWIW we're not looking for anything close to root. We just need visibility into jwt-authorizer (via journalctl -u jwt-authorizer would be plenty), the communication between nginx and jwt-authorizer (so subrequest entries in the access logs), and the nginx responses from auth (that never hit docker-registry).

Thanks for filing this!

This is what would be helpful for us in debugging registry access issues.

  • nginx access logs and error logs on the registry hosts (registry{1003,1004}.eqiad.wmnet, registry{2003,2004}.codfw.wmnet)
  • docker-registry service logs and access logs on the registry hosts
  • jwt-authorizer systemd logs on the registry hosts

@dancy, @jnuche ^ does this seem exhaustive enough for us?

That list looks good to me.

LSobanski moved this task from Incoming to Backlog on the collaboration-services board.
LSobanski lowered the priority of this task from High to Medium.Jan 17 2023, 4:17 PM
eoghan changed the task status from Open to In Progress.May 12 2023, 9:02 AM
eoghan claimed this task.
eoghan moved this task from Backlog to Work in Progress on the collaboration-services board.

Change 919350 had a related patch set uploaded (by EoghanGaffney; author: EoghanGaffney):

[operations/puppet@production] Add nginx logs for docker-registry host to rsyslog

https://gerrit.wikimedia.org/r/919350

Change 919351 had a related patch set uploaded (by EoghanGaffney; author: EoghanGaffney):

[operations/puppet@production] Send nginx and docker-registry logs to kafka

https://gerrit.wikimedia.org/r/919351

@dduvall I put up some changes to move the nginx (/var/log/nginx/{error,access}.log) and docker-registry (journalctl -u docker-registry) logs to logstash. The logs for jwt-authorizer seem to be empty though (from looking at journalctl -u jwt-authorizer), is there another place they might be?

Change 919350 merged by EoghanGaffney:

[operations/puppet@production] Add nginx logs for docker-registry host to rsyslog

https://gerrit.wikimedia.org/r/919350

Change 919351 merged by EoghanGaffney:

[operations/puppet@production] Send nginx and docker-registry logs to kafka

https://gerrit.wikimedia.org/r/919351

Change 930719 had a related patch set uploaded (by EoghanGaffney; author: EoghanGaffney):

[operations/puppet@production] registry: Add nginx logs to rsyslog

https://gerrit.wikimedia.org/r/930719

Change 930719 merged by EoghanGaffney:

[operations/puppet@production] registry: Add nginx logs to rsyslog

https://gerrit.wikimedia.org/r/930719

@dduvall I've fixed the problem that caused the nginx logs not to be included, you can now see all the logs from these hosts by filtering for program being one of docker-registry, input-finput-file-registry-nginx-access, and input-file-registry-nginx-error.

Here's a sample query: https://logstash.wikimedia.org/goto/ce91b70ec9e282c5ed7fbba0cc3fddd3

Feel free to re-open if there's any problems!