Page MenuHomePhabricator

Include request id (if present) in a comment in DB queries
Open, MediumPublic

Description

Came up in a conversation the other day: We assign a request ID to incoming requests, in many cases. It would be great to include that as a comment in SQL queries (like we do with class/method names), so that when we identify slow requests we can correlate the SQL queries that were generated by that request.

@jcrespo curious what you think of this...crazy? Doable? Somewhere in the middle?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 25 2018, 4:23 PM

I don't have any thoughts about this, neither good nor bad, mediawiki loadbalancer already adds the mediawiki function to the SQL comment to identify the origin: https://tendril.wikimedia.org/report/slow_queries?host=^db&user=wikiuser&schema=wik&hours=1 and MySQL/MariaDB already have a pseudo-standard way to summarize mysql query patterns with a digest:

root@db1052.eqiad.wmnet[sys]> SELECT query, digest FROM statement_analysis LIMIT 10;
+-------------------------------------------------------------------+----------------------------------+
| query                                                             | digest                           |
+-------------------------------------------------------------------+----------------------------------+
| SELECT `GET_LOCK` (...) AS `lockstatus`                           | 9a07945c091deaabda78ba0327ec1909 |
| SELECT `el_to` FROM `externallinks` WHERE `el_from` = ?           | 3526fe2d93402243c3653986882f33de |
| SELECT `pl_namespace` , `pl_ti ... agelinks` WHERE `pl_from` = ?  | 8ee6691d3dac58c77c41f6ff9c23a165 |
| COMMIT                                                            | ab648bfa17fa36e67f82321c0eb846cb |
| SELECT `tl_namespace` , `tl_ti ... atelinks` WHERE `tl_from` = ?  | 3bfa08845f033e90dcb0377b9a597ca7 |
| SELECT `user_id` , `user_name` ... er_id` = ? LIMIT ? FOR UPDATE  | b2eaa497fdf826873c67fb2daa7b7777 |
| SHOW GLOBAL VARIABLES LIKE ?                                      | 2c2b0d40c9af970951188c70d8765d39 |
| SELECT `cl_to` , `cl_sortkey_p ... orylinks` WHERE `cl_from` = ?  | e0fb2d642fdd3c1c20fca624852b4b80 |
| SELECT `wl_user` FROM `watchli ... ificationtimestamp` IS NULL )  | 0e696d8e5978797087e66597ba766e5b |
| SELECT `rev_id` , `rev_page` , ... id` = `page_latest` ) LIMIT ?  | deddb5b0a7e3a41e0fd3c437e516eab2 |
+-------------------------------------------------------------------+----------------------------------+
10 rows in set (0.07 sec)

I personally don't need anything else for performance tuning, but I have nothing against adding extra metadata on comments if that helps your team.

If you are open about other suggestions that would be useful to us DBAs is mediawiki logs generating a field related to the datacenter. While in most cases it can be taken from the mediawiki server using it, it may not be true in the future if cross-dc queries are a thing, plus multi-dc happens in a few months, to isolate errors only happening on 1 datacenter (e.g. there is lag on one database section, but only on one datacenter and not the other).

Vvjjkkii renamed this task from Include request id (if present) in a comment in DB queries to t8daaaaaaa.Jul 1 2018, 1:13 AM
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
Mainframe98 renamed this task from t8daaaaaaa to Include request id (if present) in a comment in DB queries.Jul 1 2018, 8:32 AM
Mainframe98 lowered the priority of this task from High to Medium.
Mainframe98 updated the task description. (Show Details)
Mainframe98 added a subscriber: Aklapper.
Tgr added a subscriber: Tgr.Sep 23 2018, 5:18 PM

This would be super nice. Can this interfere with query caching though? (We don't reuse prepared statements so probably there's no query caching anyway, but better to be sure.)

Krinkle added a subscriber: Krinkle.EditedSep 23 2018, 5:48 PM

This would be super nice. Can this interfere with query caching though? (We don't reuse prepared statements so probably there's no query caching anyway, but better to be sure.)

This is a good point to keep in mind. From what I understand, we don't employ nor have plans to employ caching at the layer of raw SQL text. We do have caching at the higher level (e.g. Memc/APC wrapper around a PHP method call), and the backend databases have their own hot caching of individual parts of tables and indexes.

We currently include the viewing user's name or IP address in several SQL queries already (through a prepended comment). We also include the function name of the PHP caller in many queries. While rare, there can be multiple different callers producing the same effective query.

The idea would be to extend the comment field to also include the request ID.

Restricted Application added a project: Platform Engineering. · View Herald TranscriptJul 23 2019, 5:07 PM

We'll get to this next quarter, maybe?