Page MenuHomePhabricator

Find an alternative solution for the mysql-proxy in PAWS
Closed, DuplicatePublic

Description

The db-proxy image for PAWS makes a fairly simple database connection to the wiki replicas using its own tool auth and a shared key with the Jupyterhub application. This allows people to use wiki replica connections in their notebooks like https://paws-public.wmflabs.org/paws-public/User:Jtmorgan/ds4ux/paws-cheatsheet.ipynb

Interestingly, the database auth lua doesn't seem entirely necessary because the pods have credentials in the env (and the connection fails without them at least with pymysql), so I may try locally with a simple haproxy or external name service.

Event Timeline

Bstorm triaged this task as Medium priority.May 19 2020, 5:25 PM
Bstorm created this task.

This is not a blocker for the upgrade because the image still builds.

I have no idea if any blocks or even patrolling was ever done based on the lua injected user data, but I think that was a criteria of making the wiki replicas wide open to this potential abuse vector.

From https://wikitech.wikimedia.org/wiki/Help:Toolforge/Rules:

  1. Do not provide direct access to Cloud Services resources to unauthenticated users

    For instance, do not allow web clients to issue shell commands or arbitrary SQL queries against the databases. Cloud Services resources are shared and limited, and it must be possible to attribute usage to specific LDAP users who are bound to the terms of use. Toolforge admin vetted Tools which include substantial anti-abuse and attribution information, such as PAWS and Quarry, are allowed.

Reading over the proxy, I definitely agree there. It at least identifies who is who rather than just proxying along as itself. That makes it tricker to get rid of than just a simple service or redirecting kind of thing.

This definitely is going to need more work than a straight replacement, per T260389: Redesign and rebuild the wikireplicas service using a multi-instance architecture.
I think I'll dub this task the "PAWS subtask" of that.

What happened to this?

Details ended up on T276284: Establish a working setup for PAWS with multi-instance wikireplicas. We are now issuing per-user database credentials in a way that is very much like what Toolforge has done for years. See also https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign#How_should_I_connect_to_databases_in_PAWS?

Yeah, the dbproxy pod is still live because I didn't want to break the few existing notebooks that the upgrade of the wikireplicas didn't break, but it'll need to be removed from the live system and the minikube build eventually.