Add a way to specify the database server to attach to so that queries can be written against:
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | fnegri | T151158 Support queries against Quarry's own database and ToolsDB | |||
| Resolved | fnegri | T348407 Allow Quarry to query ToolsDB public databases | |||
| Resolved | BUG REPORT | fnegri | T365374 [bug] Access denied for user 'quarry'@'172.16.2.72' (using password: NO) | ||
| Resolved | fnegri | T367415 Allow Quarry to query its own database | |||
| Resolved | fnegri | T367393 Allow Superset to query ToolsDB public databases | |||
| Resolved | • rook | T372395 Improve idempotency detection with helm diff |
Event Timeline
Added tool' users database per user request at https://meta.wikimedia.org/wiki/Research_talk:Quarry#Tool_labs'_user_databases?.
This would be trivial to add to the work on the parent task I just added. Therefore, adding it.
I dug around a little bit and realized it isn't as simple as what I stated above. The reason is that those are read-write databases, which dramatically changes the scope of what Quarry could do. In order to provide such services, this would require a read-only permission to most tables on ToolsDB. It would also require a review of and possible view layer/read-only role for the Quarry database.
This cannot be fitted into the scope of the replica changes. It also may be surprising for developers if their ToolsDB information became dramatically more public than it currently is, but that's more of a topic for discussion.
There's a clear path to adding database patterns to talk to toolsdb. The quarry database is how quarry works in dev mode, so it clearly works. However, I think it should be done with the read-only user used elsewhere. That's all pretty easy. It should also probably be done in two stages (one patch for toolsdb and one for self-query) so that we can roll back something that is found to be problematic.
I think letting users query public toolsdb databases has clear value (but not sure of the utility of querying quarry's internal db).
As the database selector field exists now, we could just check if the dbname has double underscores and ends in _p – that should indicate a tools database. Credentials used for connecting must not have write access to any database.
We are finally close to resolving this task! After the work in T348407 I can successfully query ToolsDB from Quarry. Access at the moment is limited to one database for testing (s55771__wsstats_p). As discussed in T348407 I will send an email to cloud-announce to inform everyone that we're opening this type of access to the ToolsDB databases (only for the databases ending with _p).
I will give people a 2-week notice for this change, and enable access to all _p databases on Monday, July 1st. If you would like to have access to a specific database that you are managing and would like to have it enabled before that date, just leave a comment below.
I also created T367393: Allow Superset to query ToolsDB public databases to add a similar type of access from Superset.
This task originally mentioned both access to ToolsDB and access to "The 'local' database that is used to track Quarry's saved queries". Is access to Quarry's own database something that anyone is still interested in? Edit: I split this requirement into its own task T367415: Allow Quarry to query its own database
All the subtasks are now completed and you can use Quarry to query ToolsDB's public dbs and Quarry's internal db.
I will keep this task open until I update the related documentation pages.
I have updated all the documentation I could find, let me know if I missed anything:
- https://meta.wikimedia.org/wiki/Research:Quarry
- https://wikitech.wikimedia.org/wiki/Portal:Data_Services
- https://wikitech.wikimedia.org/wiki/Portal:Data_Services/Admin/Quarry
- https://wikitech.wikimedia.org/wiki/Superset
I am marking this task as Resolved, after 2,873 days from its creation. 😺
@UOzurumba apologies for the delay. Yes, it's a good idea to add it to the next Tech News! Here's a summary, feel free to edit it:
ToolsDB's public databases are now accessible from both Quarry and Superset.
Those databases have always been accessible to every Toolforge user, but they are now more broadly accessible, as Quarry can be accessed by anyone with a Wikimedia account.
In addition, Quarry's internal database can now be queried from Quarry itself. This database contains information about all queries that are being run and starred by users in Quarry. This information was already public through the web interface, but you can now query it using SQL.