Page MenuHomePhabricator

Deploy "Striker" Tool Labs console to WMF production
Closed, ResolvedPublic

Description

Striker is a Django wsgi application with quite a few python packages as dependencies. It needs to be deployed in the WMF production cluster so that it can securely access the LDAP directory for auth-bind authentication. It will also need access to a MySQL/MariaDB database server. Having memcached available would be useful as well to allow session storage for redundant servers.

Security review: T135784: Security review of Tool Labs console application

Needed services

  • LDAP access (read + auth-bind; writes planned for future)
  • MySQL/MariaDB (small collection of tables to track authentication, git repos)
  • Conduit API access to phabricator.wikimedia.org
  • memcached
  • pybal? (can probably get away with direct varnish LB if pybal is difficult)

Future services

  • LDAP write access
  • Elasticsearch (ideally the Tool Labs hosted ES cluster to make sharing data with tools easier)
  • Nova api
  • TOTP validation (either access to labswiki db on silver or move seeds into LDAP)
  • k8s task status/monitoring api

Host sizing

Local disk and processor needs should be light. The current and planned features do not call for any local resource storage. Most activities will be mysql and/or other external api calls. This might be a good candidate for deployment in a container/VM if that fits with the other network access needs.

Open questions

  • What do we call this thing? "Striker" is a codename for the software. In the Tool Labs vision document I called the service console.wmflabs.org. This probably isn't the best name either as at least in the short/mid-term this application will be focused on Tool Labs rather than Labs generally. Perhaps toolsadmin or toolmanager or something similar?
  • Packaging! Everybody's favorite problem. Can we use wheels and Scap or do we need to figure out how to make debs for all the dependencies?
  • Location, location, location. Where on the network should this live? Can this be deployed in the ganeti cluster or does it need bare metal for some reason (like access to things that are both inside and outside of the Labs environment)?

Related Objects

StatusSubtypeAssignedTask
ResolvedLucasWerkmeister
Resolvedmatmarex
ResolvedLegoktm
ResolvedLegoktm
Opendcaro
Resolvedyuvipanda
Resolveddcaro
Resolvedbd808
Resolvedbd808
ResolvedNone
Resolvedbd808
Resolvedbd808
Resolved dpatrick
Resolvedbd808
Resolved mmodell
Resolvedjcrespo
Resolvedbd808
Resolvedbd808

Event Timeline

Future services:
Nova api

Does it need to be in the labs support network then?

What do we call this thing? "Striker" is a codename for the software. In the Tool Labs vision document I called the service console.wmflabs.org. This probably isn't the best name either as at least in the short/mid-term this application will be focused on Tool Labs rather than Labs generally. Perhaps toolsadmin or toolmanager or something similar?

On the other hand we don't want to be stuck with the tools-specific name if it's going to be expanded in scope later.

Security review is tentatively scheduled for July 5-15.

Future services:
Nova api

Does it need to be in the labs support network then?

I was hoping someone could tell me that. @Andrew what are your thoughts?

What do we call this thing? "Striker" is a codename for the software. In the Tool Labs vision document I called the service console.wmflabs.org. This probably isn't the best name either as at least in the short/mid-term this application will be focused on Tool Labs rather than Labs generally. Perhaps toolsadmin or toolmanager or something similar?

On the other hand we don't want to be stuck with the tools-specific name if it's going to be expanded in scope later.

In T128158#2114003 @yuvipanda seemed to prefer a Tool Labs only workflow. My personal focus for the app is going to be Tool Labs, but there is overlap with general Labs issues that could be handled by the same app in the future. Obvious points of overlap include:

  • LDAP account creation
  • SSH authorized keys management
  • possible TOTP two-factor auth management

If removing OpenStackManager from wikitech is a goal for the Labs team then there will need to be some system for creating the necessary LDAP accounts to enable Horizon and OpenStack usage. That is probably better discussed in detail on some other ticket, but if the discussion tends towards wanting to use Striker to provide a shared solution for account management then I would agree that a less Tools specific name would be the right choice. DNS entries are cheap though so we could start with toolsadmin.wikimedia.org and then rebrand later when and if there is convergence with Labs wide concerns. I think using a wmflabs.org name is ruled out by being hosted outside of the Labs projects, but I may be wrong about that.

That comment of mine was way less clearer than I intended. What I meant was that the tools specific functionality should be limited to only the tools project, without attempting to build something that would theoretically allow anyone to build something tools-like (like service groups). I agree that all the things you listed seem like reasonable things to put in here, assuming we can provide a clear line for 'what is in striker, what is in horizon'.

We could theoretically have a .wmflabs.org domain, but not sure if that's a good idea.

In a discussion on irc, @yuvipanda suggested that this application should live in the labs support vlan and that we could probably either find a misc spare to deploy it on or that it might be able to be installed on a labservices host.

If we co-locate it on labservices then it should probably have an apparmor profile to contain the impact of any undiscovered vulnerabilities. That level of protection may be useful even in an isolated deployment.

Deployment with scap3 and a git repo of wheels dependencies seems to be the obvious choice for now. Future improvements for wheels distribution may benefit Striker, ORES and other python based apps.

I think toolsadmin.wikimedia.org is the right hostname for initial deployment. If and when Striker gains features that are generally useful for Labs we can easily add an additional CNAME that is more generic.

Deployment with scap3 and a git repo of wheels dependencies seems to be the obvious choice for now. Future improvements for wheels distribution may benefit Striker, ORES and other python based apps.

http://platter.pocoo.org/dev/ looks neat. We could build something like that right into scap3.

Gerrit repos for labs/striker and labs/striker/wheels requested on mw.o.

bd808 moved this task from Backlog to Ready on the Striker board.

Based on my reading of https://wikitech.wikimedia.org/wiki/MariaDB/misc I think the Striker database should probably live on the m5 shard when we get that far.

@Andrew and I talked and we would like to put this on californium for now as a similar function to horizon with that node being similar in role with resources to get this going. This seems like it requires an ask of the dba's

Change 305141 had a related patch set uploaded (by BryanDavis):
Provision Tool Labs admin console (Striker) on Californium

https://gerrit.wikimedia.org/r/305141

Change 305142 had a related patch set uploaded (by BryanDavis):
Add toolsadmin.wikimedia.org to misc varnish

https://gerrit.wikimedia.org/r/305142

Change 305143 had a related patch set uploaded (by BryanDavis):
Add toolsadmin.wikimedia.org

https://gerrit.wikimedia.org/r/305143

Change 305143 merged by Yuvipanda:
Add toolsadmin.wikimedia.org

https://gerrit.wikimedia.org/r/305143

Change 305141 merged by Yuvipanda:
Provision Tool Labs admin console (Striker) on Californium

https://gerrit.wikimedia.org/r/305141

Change 306582 had a related patch set uploaded (by BryanDavis):
Add service name californium8044 for californium.wikimedia.org

https://gerrit.wikimedia.org/r/306582

Change 306582 abandoned by BryanDavis:
Add service name californium8044 for californium.wikimedia.org

Reason:
This is gross. We'll fix it another way.

https://gerrit.wikimedia.org/r/306582

Change 306604 had a related patch set uploaded (by BryanDavis):
striker: Replace nginx with apache

https://gerrit.wikimedia.org/r/306604

On 2016-08-24, @yuvipanda and I tried to get Striker deployed onto californium in the WMF production cluster. While doing so we ran into a few small issues:

  • Changes to service::uwsgi were not compatible with usage of the define by ORES.
  • Californium is a Trusty host, and the striker classes expected Jessie
    • Resolved by rebuilding the python wheels for Trusty and changing the Puppet guard conditions
  • Port 80 on californium is owned by Apache, not nginx
    • Originally we worked around this by putting nginx on port 8044.
    • Keeping nginx would require adding a service name for californium so that it could be used as a backend for Varnish (T138546). Patches were prepared to do this, but they didn't seem to be the proper course of action.
    • Instead, striker will replace nginx with Apache as the fronting reverse proxy and static asset serving mechanism (https://gerrit.wikimedia.org/r/#/c/306604/).

We will attempt to complete the deployment with these changes on 2016-08-25.

Change 306604 merged by Yuvipanda:
striker: Replace nginx with apache

https://gerrit.wikimedia.org/r/306604

Change 305142 merged by Yuvipanda:
Add toolsadmin.wikimedia.org to misc varnish

https://gerrit.wikimedia.org/r/305142

https://toolsadmin.wikimedia.org/

Many thanks to @mmodell for Phab magic, @yuvipanda and @jcrespo for techops support, @dpatrick for code review, and @Tnegrin and @kaldari for letting me work on this. Now its time to start fixing bugs and adding more features!