Page MenuHomePhabricator

Requesting access to graphite hosts for addshore
Closed, ResolvedPublic

Description

Username: Addshore
Full name: Adam Shorland

Over the past years the use of graphite for storing various metrics at WMDE has increased.
I would like to request access to the graphite hosts and whatever is needed to be able to:

  • Delete old unused metrics, such as T121521 T140280 T157012 T121523
  • Investigate and fix odd aggregations (as was done in T199968)
  • Merge metrics as metric names / locations change during development (as was done in T196609 but no merge was done)

I seem to remember having a discussion about this some years ago somewhere but I couldn't find any references to it.

  • - access request (or expansion) has sign off of WMF sponsor/manager (sponser for volunteers, manager for wmf staff)
  • - non-sudo requests: 3 business day wait must pass with no objections being noted on the task
  • - sudo requests: all sudo requests require explicit approval during the weekly operations team meeting. No sudo requests will be approved outside of those meetings without the direct override of the Director of Operations.
  • - Patchset for access request

Event Timeline

Restricted Application added a project: Operations. · View Herald TranscriptNov 5 2018, 5:30 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
colewhite triaged this task as Normal priority.Nov 5 2018, 8:34 PM
ArielGlenn updated the task description. (Show Details)Nov 12 2018, 10:37 AM

@Addshore It looks like you need write access to /var/lib/carbon/whisper on graphite1001 and 2001, which means being able to remove or move things in there as the _graphite user. If so, we will need new group for this; I don't see any existing group that does what you want. Will that also cover merge of metrics?

Please get manager sign-off, we'll need it no matter what.

As far as I remember from the last time I discussed this with someone there are a variety of scripts that allow you to perform operations on whisper metrics.
These scripts are documented @ https://github.com/graphite-project/whisper/blob/master/README.md

As far as I know I would essentially need to be able to run most if not all of these scripts as the _graphite user:

  • whisper-info (get info about a metric)
  • whisper-merge / whisper-diff (diff & merge two files)
  • whisper-resize & whisper-set-aggregation-method (fix bad metrics / aggregations etc)

And yes, as far as I know running these scripts would require write access to /var/lib/carbon/whisper on the graphite hosts.

Please get manager sign-off, we'll need it no matter what.

I can easily get WMDE manager sign off, I'm not really sure who from the WMF would have to / want to sign off on this.. ?

...

I can easily get WMDE manager sign off, I'm not really sure who from the WMF would have to / want to sign off on this.. ?

Hrm, good question. What team would be using/benefiting from this work on the wmf side, any idea?

Addshore added a comment.EditedNov 12 2018, 3:42 PM

Hrm, good question. What team would be using/benefiting from this work on the wmf side, any idea?

As it is Graphite the only team I think it relates to at all is Operations really.

Hrm, good question. What team would be using/benefiting from this work on the wmf side, any idea?

As it is Graphite the only team I think it relates to at all is Operations really.

Ok, let me poke @mark and see if he'll do that (or think of a better wmf contact).

RobH assigned this task to fgiunchedi.Nov 19 2018, 5:53 PM
RobH added a subscriber: RobH.

Filippo volunteered to review this during our SRE team meeting, reassigning.

fgiunchedi removed fgiunchedi as the assignee of this task.Nov 22 2018, 10:56 AM
fgiunchedi added a subscriber: fgiunchedi.

I've talked to @Addshore to clarify a bit the work involved and looks good to me. Implementation wise we can do the sudo route allowing certain groups to execute commands as _graphite user I think. Alternatively we can put users in _graphite unix group, though I prefer the sudo route as it is easier to audit IMHO.

AFAICS we still need various signoffs but otherwise good to go on my side!

Quick question, Filippo, you mention "allowing certain groups" do you know of some in particular, or would a new one have to be created explicitly for this?

Quick question, Filippo, you mention "allowing certain groups" do you know of some in particular, or would a new one have to be created explicitly for this?

Good question, I think we can use graphite-admins and extend the group privileges to run commands (maybe a shell?) as _graphite. The group already includes Ian Marlier though I think we're ok anyways.

jcrespo claimed this task.Nov 26 2018, 6:11 PM

@Addshore, @fgiunchedi's Plan was approved on today's SRE meeting.

The following was not a condition for it, but it was commented that this will technically allow potential mistakes (e.g. data deletion) on data beyond the one you directly manage, so that best practices (log, coordinating with ops, etc.) should be follow. It was commented that we trust you to do precisely that, so please don't prove us wrong. :-)

I will prepare a commit but I would be happy if Filippo can support me by reviewing it as he may know better our graphite implementation.

Change 476558 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] admin: Add addshore to graphite-admins; allow _grahite commands

https://gerrit.wikimedia.org/r/476558

jcrespo reassigned this task from jcrespo to fgiunchedi.Nov 30 2018, 12:06 PM

Assigning me to prevent this from getting forgotten when I go on vacations, unresponsive (the patch should be fine as is).

Change 476558 merged by Effie Mouzeli:
[operations/puppet@production] admin: Add addshore to graphite-admins; allow _graphite commands

https://gerrit.wikimedia.org/r/476558

jijiki updated the task description. (Show Details)Dec 3 2018, 12:15 PM
jijiki added a subscriber: jijiki.

@Addshore Please ensure that your access on graphite hosts is alright

Addshore closed this task as Resolved.Dec 3 2018, 12:52 PM

@Addshore Please ensure that your access on graphite hosts is alright

I can confirm I can access the hosts and _graphite user