Page MenuHomePhabricator

give releng access to logs to debug buildkit-to-wmf-registry publishing
Open, MediumPublic

Description

This ticket is to discuss and possibly implement access for releng group shell users to any access logs that are considered helpful to debug buildkit-to-wmf-registry publishing problems.

re this IRC comment:


< dduvall> ..debug a buildkit-to-wmf-registry publishing problem...

< dduvall> long term, it would be great if we could figure out a level of access for releng folks to see the access logs because i imagine we might be debugging publishing issues once in a while once we start migrating projects

< dduvall> ah i just saw the comment here https://phabricator.wikimedia.org/T322453#8375676 maybe it's just about simply upgrading. however, it's probably still a good idea to see what's happening server side which i cannot

Let's go from here and define which hosts, which logs etc and then see if we want to do that via admin groups or logstash?

We have another open ticket about getting gitlab logs into logstash.

Event Timeline

Dzahn updated the task description. (Show Details)
Dzahn added a subscriber: dduvall.

Thanks for filing this!

This is what would be helpful for us in debugging registry access issues.

  • nginx access logs and error logs on the registry hosts (registry{1003,1004}.eqiad.wmnet, registry{2003,2004}.codfw.wmnet)
  • docker-registry service logs and access logs on the registry hosts
  • jwt-authorizer systemd logs on the registry hosts

@dancy, @jnuche ^ does this seem exhaustive enough for us?

I can imagine a scenario where there might be issues with the swift backend, but I don't think anything at that level wouldn't be surfaced by the other logs somehow and probably noticed and fixed by SRE very quickly.

Thanks for filing this!

This is what would be helpful for us in debugging registry access issues.

  • nginx access logs and error logs on the registry hosts (registry{1003,1004}.eqiad.wmnet, registry{2003,2004}.codfw.wmnet)
  • docker-registry service logs and access logs on the registry hosts
  • jwt-authorizer systemd logs on the registry hosts

@dancy, @jnuche ^ does this seem exhaustive enough for us?

I can imagine a scenario where there might be issues with the swift backend, but I don't think anything at that level wouldn't be surfaced by the other logs somehow and probably noticed and fixed by SRE very quickly.

I would be ok with granting access to logs via logstash, I'm quite opposed to widen access to the registry hosts, it seems not really useful.

Also: both the registry and nginx keep access logs, so I guess it's enough to export one of the two.

[...]
Also: both the registry and nginx keep access logs, so I guess it's enough to export one of the two.

I'd say export those from docker-registry. It writes access log plus additional stuff that might be useful

I just added this small section to the Wikitech Logstash page how I got logs from "misc" systems into logstash with 2 changes.

One to get them to local syslog via rsyslog and then another to tell rsyslog to ship to kafka. And then I could see them in logstash filtering by "program". Example patches linked below are what I would copy to get nginx logs there.

https://wikitech.wikimedia.org/wiki/Logstash#Getting_logs_from_misc_systems_into_logstash

[...]
Also: both the registry and nginx keep access logs, so I guess it's enough to export one of the two.

I'd say export those from docker-registry. It writes access log plus additional stuff that might be useful

There will be nginx access log entries that are not reflected in the docker-registry access logs that would be useful to us as well, most notably anything related to auth using the GitLab JWT tokens, subrequests between nginx and jwt-authorizer and the related nginx responses to the client. These do not hit docker-registry at all.

I would be ok with granting access to logs via logstash, I'm quite opposed to widen access to the registry hosts, it seems not really useful.

Also: both the registry and nginx keep access logs, so I guess it's enough to export one of the two.

FWIW we're not looking for anything close to root. We just need visibility into jwt-authorizer (via journalctl -u jwt-authorizer would be plenty), the communication between nginx and jwt-authorizer (so subrequest entries in the access logs), and the nginx responses from auth (that never hit docker-registry).

Thanks for filing this!

This is what would be helpful for us in debugging registry access issues.

  • nginx access logs and error logs on the registry hosts (registry{1003,1004}.eqiad.wmnet, registry{2003,2004}.codfw.wmnet)
  • docker-registry service logs and access logs on the registry hosts
  • jwt-authorizer systemd logs on the registry hosts

@dancy, @jnuche ^ does this seem exhaustive enough for us?

That list looks good to me.

LSobanski moved this task from Incoming to Backlog on the serviceops-collab board.
LSobanski lowered the priority of this task from High to Medium.Tue, Jan 17, 4:17 PM