Page MenuHomePhabricator

Explore 'Analyze' statement as substitute for Explain
Closed, DeclinedPublic

Description

It's possible that as part of the new server(s) rollout and upgrading to mariadb 10.1.x we could facilitate the intention with

https://mariadb.com/kb/en/mariadb/analyze-statement/

EXPLAIN is thwarted at the moment by the use of views as a security measure.

Event Timeline

chasemp triaged this task as Medium priority.Jul 22 2016, 1:32 PM
chasemp created this task.
Danny_B removed a subscriber: Danny_B.Jul 23 2016, 2:28 AM
chasemp added a comment.EditedAug 1 2016, 2:19 PM

The idea w/ associating with T140788 is we want to approach this issue as part of the rearchitecting. Removing this as a subtask removes this part of the outcome.

edit: If people find it more helpful to reframe as "explore analyze for users" or something then ok. The historical task this was merged as a duplicate of is not a real duplicate. Closing the legacy task would make more sense to me if we want to do anything.

chasemp renamed this task from Users can't run EXPLAIN queries to check the theoretical efficiency of their SQL to Explore 'Analyze' statement as substitute for Explain.Aug 1 2016, 2:27 PM

10.1 Analyze requires also SELECT grants for the underlying tables, so it cannot be used. This doesn't take away from the fact that there is already a workaround (SHOW EXPLAIN), and we should a) document it better b) maybe make it easier to run it (which technically, is already available on https://tools.wmflabs.org/tools-info/optimizer.py ).

@jcrespo what do you think of a http service that is hosted on hardware somewhere and maintained by the labs team, that lets you send it a query and gives you back the EXPLAIN output? This could be given enough rights to perform an ANALYZE. It could possibly be risky, since we'll have to prevent SQL injection in this service.

That is a good idea! It would be really useful and easy to use.
Apart from setting the right grants, we can also try to set up some sort of chroot or a "fake" mysql server just in case there are some SQL injection cases we couldn't foresee. Also, I assume that your idea would also run over already sanitized data?

Yep, it would!

Can you explain what you mean by 'fake' mysql server?

simple API would be:

POST to labsdb-analytics.eqiad.wmnet/v1/analyze (or labsdb-web.eqiad.wmnet/v1/analyze) your SQL query, and it will return ANALYZE output raw. This can be a tiny python service I can write, and we can run it on each of the labsdb hosts. This will allow us to use a mysql user account that can only connect via localhost, further limiting the privileges for it. If you'd rather not have this run on the db hosts, we can run it on one of the service hosts we have, and use a different URL (or just add to the haproxy config).

Yep, it would!

Can you explain what you mean by 'fake' mysql server?

If we are not completely sure about grants and possible SQL injection, we can always generate fake data to populate the tables, after all all people will need would be the EXPLAIN output, not really caring about the data at the moment they are optimizing queries.
I might be too paranoid with the security :-)

jcrespo added a comment.EditedDec 22 2016, 9:19 AM

@jcrespo what do you think of a http service that is hosted on hardware somewhere and maintained by the labs team, that lets you send it a query and gives you back the EXPLAIN output? This could be given enough rights to perform an ANALYZE. It could possibly be risky, since we'll have to prevent SQL injection in this service.

I do not like it, subqueries on MariaDB still get executed fully with EXPLAIN (I would assume the same will happen with analyze, which basically is an EXPLAIN, and you can extract slowly, but ok, information from the EXPLAIN, by querying each id individually, knowing if a row exists or not (e.g. you could extract info one letter at a time). That is without taking into account any potential vulnerability of the code, which while I do not doubt of your skill, parsing SQL is not preciselly trivial. I would prefer to document https://tools.wmflabs.org/tools-info/optimizer.py or officially support it from labs/rewrite it without requiring extra privileges.

If we are not completely sure about grants and possible SQL injection, we can always generate fake data to populate the tables,

Unless you plan to create 10 TB of fake data (that in any case, it would not be identical to production/labs) the info you would get would not be accurate. Ok for testing mediawiki on how it will behave, not ok if the only thing you want is to see why a query is slow- remember that EXPLAIN will change its plan based on the actual data on-table (using an index or not, etc.).

@jynus ah, I didn't know that (re: subqueries). TIL.

I think in that case we should do the same service but with a SHOW EXPLAIN. API can remain the same. Fire off query, run SHOW EXPLAIN, then kill query. Does that sound better?

@jynus ah, I didn't know that (re: subqueries). TIL.

I think in that case we should do the same service but with a SHOW EXPLAIN. API can remain the same. Fire off query, run SHOW EXPLAIN, then kill query. Does that sound better?

Yes. If the service exists, we can even create a command line utility for tools with curl.

https://tools.wmflabs.org/tools-info/optimizer.py no longer works, and that is a problem for users wanting to EXPLAIN their queries.

1978Gage2001 moved this task from Triage to In progress on the DBA board.Dec 11 2017, 9:45 AM
Marostegui moved this task from In progress to Triage on the DBA board.Dec 11 2017, 10:58 AM
Marostegui closed this task as Declined.Oct 16 2018, 12:36 PM

I am going to decline as there are no plans to use analyze as a substitute of EXPLAIN. Instead show explain maybe in combination with https://tools.wmflabs.org/tools-info/optimizer.py should be done. T141095#2895716
If anyone feels this needs to be re-opened, please do so.

But for now https://tools.wmflabs.org/tools-info/optimizer.py is what we have.