Page MenuHomePhabricator

add performance team members to webserver_misc_static servers to maintain sitemaps
Closed, ResolvedPublic

Description

As described by @BBlack in today's SRE meeting, members of the performance team (perf-team / @Imarlier ) also need access to webservers hosting misc static sites (currently bromine and vega) because it now hosts sitemap.wikimedia.org

This can be achieved by adding the perf-roots admin group to the puppet role webserver_misc_static in hierdata/role/common/. and would give them root access on bromine.eqiad.wmnet and vega.codfw.wmnet

This is replacing the access formerly requested in T201350 and the reason for all this is to work on T199252 and the other sitemap-related tickets linked from there.

Up further thought the releases servers (releases1001/2001) are better suited for this since they can fulfill all the requirements and already have multiple groups for releasers/uploaders to upload files.

Event Timeline

Dzahn created this task.Aug 27 2018, 5:21 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 27 2018, 5:21 PM

Change 455602 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] admin: add perf-roots to webserver_misc_static

https://gerrit.wikimedia.org/r/455602

Dzahn added a subscriber: Imarlier.

@Dzahn I think you meant perf-team, not perf-roots. The perf-team group is for services owned/maintained by Performance that all team members have access to.

The perf-roots group is a smaller restricted group granting wide root for the Varnish/MW production infrastructure.

Dzahn renamed this task from add perf-root admins to webserver misc static servers to add perf-team admins to webserver misc static servers.Aug 27 2018, 5:44 PM
Dzahn updated the task description. (Show Details)

Ok, thanks. Renamed the ticket and amended the patch accordingly.

Dzahn added a comment.Aug 27 2018, 5:46 PM

for the record. "perf-team" also means "root access" in this context.

Imarlier moved this task from Inbox to Radar on the Performance-Team board.Aug 27 2018, 8:03 PM
Imarlier edited projects, added Performance-Team (Radar); removed Performance-Team.

Does everyone on the team need it or or just @Imarlier ? Do they need root to get files into place or are some other permissions sufficient? What else do they need to do on those boxes?

ArielGlenn triaged this task as Normal priority.Aug 28 2018, 11:19 AM
Dzahn added a comment.Aug 28 2018, 2:18 PM

They need to upload files to sitemap.wikimedia.org's document root. I think we always want at least 2 people to avoid SPOFs. Correct me if i miss something else, Ian.

@ArielGlenn For the moment, I'm the one working on this, but I'm not sure that will continue to be the case so I'd prefer if it were the entire perf-team that had access.

I don't know how these boxes are configured, so I can't really say what access is needed (root vs. other). We will need to be able to move files into the directory from which sitemaps.wikimedia.org is being served, and to delete files from that directory.

(In other words, pretty much what @Dzahn wrote while I was writing my comment :-) )

Dzahn added a comment.Aug 28 2018, 2:22 PM

I'm thinking whether sitemap.wm.org should have been on the releases servers rather than here. Same kind of minimal Apache but we already have separate groups for releasers to upload stuff without needing root. We could certainly create releasers-sitemaps along to releasers-wikidiff2 etc.

There's where we would like sitemap.wm.o to be and where the various workarounds wll not be a PITA if we have it there.

The directory you want to be able to write into is /srv/org/wikimedia/sitemaps/ and I don't see why we can't just make it group wikidev and writeable by that group. But maybe you will want to look at logs and other things?

I'm thinking whether sitemap.wm.org should have been on the releases servers rather than here. Same kind of minimal Apache but we already have separate groups for releasers to upload stuff without needing root. We could certainly create releasers-sitemaps along to releasers-wikidiff2 etc.

The constraints on location are:

  • Needs to be accessible by Varnish (since varnish will be serving requests for, eg, https://it.wikipedia.org/sitemap.xml from http://sitemaps.wikimedia.org/it.wikipedia.org/sitemap.xml). Sitemaps are only valid if served from the same domain as the mapped URLs, so this isn't really fungible.
  • Needs to be accessible for manual addition/removal of sitemaps while we're testing this. Because there's the potential for sitemaps to significantly impact discoverability of content (either positively or negatively), we're intending to be very careful about testing this.
  • Assuming that discoverability is improved by the addition of sitemaps, we're going to want to automate sitemap generation. Sitemaps are generated by a maintenance script, currently being run by hand on mwmaint1001.

I'm really agnostic about where things live, as long as they meet these requirements.

Pinging @BBlack again so we can firm up the final retsing place for this thing, or at least put the final nail in its coffin :-D

Dzahn renamed this task from add perf-team admins to webserver misc static servers to add perf-team admins to releases servers (was: webserver misc static servers).Aug 28 2018, 6:18 PM
Dzahn updated the task description. (Show Details)

On the whole releases vs microsites bit:

  • Both are behind varnish so they work for @Imarlier 's first constraint.
  • Reasons to dislike releases: Releases currently hosts everything under https://releases.wikimedia.org/ (+ a jenkins-releases hack). It's not really set up in puppetization to support arbitrary new vhosts per new things we stick there, but we could add sitemaps into the main config somewhere anyways. It also doesn't logically fit (these aren't software releases).
  • Reasons to like releases: apparently simpler perms mgmt? Although I think we can easily configure permissions either way (e.g. wikidev perms idea on microsites). The perms issue is probably temporary (see more below).
  • Reasons to dislike microsites: It's not really a microsite by some definition, and the current microsites servers have very limited storage under /srv which probably won't be enough for the eventual full sitemaps data for all wikis (but according to some earlier IRC chat, this is very easy to bump since it's virtual storage anyways).
  • Reasons to like microsites: Puppetization was trivial, and microsites seems to be the closest thing we have to a puppetized solution for many disparate random small sites with unique names.

Going a little further on other concerns that might impact thinking:

  • In the short-to-medium term, this data is manually generated and copied over. This is because the concept is still being worked out, and the implications of bad sitemap data could be pretty awful SEO, so there might need to be manual hacks and/or quick corrections until we're sure we've got it right. Hence the need for manual access and permissions for the perf-team to upload new stuff generated by evolving scripts on mwmaint1001.
  • In the long term, eventually the scripts on mwmait1001 will be fully-automated and we'll want to rsync over new data updates periodically for all wikis. At that point we don't necessarily need write perms on the hosts for perf-team anymore. Before we get to that point, though, we'll have to also ensure we have enough storage for all the sitemaps too.

I don't think in either case we have to worry about things like direct access to access logs on the servers, etc. Since the traffic flows through varnish, the normal analytics tools can be used to look at the requests.

Really, if we're going all out on perfect design choices, this should probably be its own separate set of virtual servers with an independent puppetization from both releases and microsites. However, I don't think the perfect should be the enemy of the good here. This is a fairly urgent issue (working on the sitemaps data itself, testing it w/ Google, and undoing the Google-carnage caused by the copyright protests), and we can always decide on a better location and factoring for this at a later point after initial test work on a couple wikis is done and we have a better total size estimate, before we turn on full automation for all wikis and generate lots of data updates routinely (also to sort out by then: caching policy issues).

For now, I'd go with whichever direction gets write access for perf-team to working sitemaps directories easier/quicker (sounds like wikidev perms on microsites, to me, but open to whatever!) to continue testing.

Dzahn added a comment.Aug 29 2018, 7:18 PM

Thanks for the detailed answer, bblack. Alright, let's go with the easier solution of not moving the site another time. So we'll add the admin group on the microsites hosts as the ticket was named originally.

We should still add a new group to own the files. I'll unblock that.

Change 456279 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] admins: create new group sitemap-admins for sitemap uploaders

https://gerrit.wikimedia.org/r/456279

Change 456279 merged by Dzahn:
[operations/puppet@production] admins: create new group sitemaps-admins for sitemaps uploaders

https://gerrit.wikimedia.org/r/456279

Change 455602 merged by Dzahn:
[operations/puppet@production] admin: add sitemap-admins to webserver_misc_static

https://gerrit.wikimedia.org/r/455602

Change 456283 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] microsites::sitemaps: set sgid bit on sitemaps doc root

https://gerrit.wikimedia.org/r/456283

Change 456283 merged by Dzahn:
[operations/puppet@production] microsites::sitemaps: set sgid bit on sitemaps doc root

https://gerrit.wikimedia.org/r/456283

Change 456285 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] microsites::sitemaps: use 'recurse => true' to manage permissions

https://gerrit.wikimedia.org/r/456285

Change 456285 merged by Dzahn:
[operations/puppet@production] microsites::sitemaps: use 'recurse => true' to manage permissions

https://gerrit.wikimedia.org/r/456285

Dzahn renamed this task from add perf-team admins to releases servers (was: webserver misc static servers) to add performance team members to webserver_misc_static servers to maintain sitemaps.Aug 29 2018, 9:41 PM
Dzahn added a comment.Aug 29 2018, 9:46 PM

This should be resolved now:

  • sitemaps-admins is a group that has all of perf-team as members (but we are not using perf-team itself, this also made it easier because without adding root/sudo we don't have to go through another SRE meeting
  • sitemaps-admins has been added to webserver_misc_static_role
  • the sitemaps document root is group-owned by sitemaps-admins
  • all files in it, including the existing ones are writable by the group and puppet manages it recursively
  • directory has sgid bit set so new files will be owned by group, also done in puppet

@Imarlier You can now ssh to the hosts: bromine.eqiad.wmnet and vega.codfw.wmnet and you can write to /srv/org/wikimedia/sitemaps/ and overwrite the files inside it. The same is true for the other users in your team.

Once the user(s) verify that uploads work as expected, we can close this ticket.

Same issue as https://phabricator.wikimedia.org/T202650#4541158; Aaron has global root and can access these hosts without sitemaps-admins.

Which means we need to remove something or other; @Imarlier and/or @aaron, which should go?

@Dzahn @ArielGlenn Confirmed, can connect and write to that dir. Thanks!

@MoritzMuehlenhoff @ArielGlenn sitemaps-admins. Perf team (and Aaron, specifically) spend a lot of time debugging production issues. Removing access would have a seriously negative impact on our ability to do our job.

The issue is that there are duplicate permissions granted by virtue of the two roles perf-team and ops; the same goes for perf-roots and ops. There's overlap. He already has all perms by virtue of being in the ops group.

The script that checks for this is /usr/local/bin/cross-validate-accounts and the output it gives us is
Malformed membership for ops user aaron, has additional group(s): set(['perf-team', 'perf-roots'])

So it's not that we want to deny access, just remove the duplicate part of it. Hope that's clearer.

@ArielGlenn Gotcha, that helps. I think you can remove ops. (I don't have any sense of what that group might give access to that perf-roots and perf-team do not, but hypothetically they should be similar enough (if not entirely overlapping) that removing him from ops shouldn't meaningfully change anything.

OK just to clarify a bit more, 'ops' gives global root on everything. So before deciding that, let's make sure Aaron doesn't use root on some other boxes that performance-team/performance-roots doesn't give. And then I'm down (or anyone else who gets to it before I clock back in tomorrow, since I'm typing past my work-bedtime ;-) )

aaron added a comment.Aug 31 2018, 8:21 AM

perf-roots seems appropriate. If anything extra is needed, that can always be discussed in the future (probably by adding to perf-roots).

@aaron: To clarify/confirm: You don't need cluster-wide root access anymore? The only reason we have this discussion because we were (accidentally) added to the new groups created for the performance team. But none of those actually add anything to your privileges as you already have global root. So please either confirm that

  1. you want to keep your existing global root access (which you might be using for debugging during outages) so that we strip the superfluous performance team groups
  2. you don't actually need global root anymore; then we'd keep you in the performance team groups and remove the global root access.

@MoritzMuehlenhoff Based on Aaron's comment, you are correct in your understanding. You can go ahead with those permissions changes.

Change 456663 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] admins: remove aaron from ops

https://gerrit.wikimedia.org/r/456663

Change 456663 merged by Dzahn:
[operations/puppet@production] admins: remove aaron from ops

https://gerrit.wikimedia.org/r/456663

Dzahn closed this task as Resolved.Aug 31 2018, 10:08 PM
Dzahn claimed this task.

This should conclude the ticket. Please reopen if any remaining issues.

We now have: Membership of ops group in LDAP and YAML are not identical (from accountcheck). This should be fixed.

ArielGlenn reopened this task as Open.Sep 1 2018, 5:23 AM
Dzahn added a comment.Sep 2 2018, 3:25 PM

in LDAP: removed him from the 'ops' group. I meant to add him to "wmf" instead but he was already a member in that. Given that all the web-based access is "ops or wmf or nda" that i know of this should not change anything regarding actual access but solve the issue above.

ArielGlenn closed this task as Resolved.Sep 3 2018, 7:38 AM

I got no new emails from the account checker so it looks good! Closing for real this time.