Page MenuHomePhabricator

Add a primary key to querycache table
Open, Stalled, Needs TriagePublic

Description

querycache needs a primary key; see T17441

Event Timeline

Change 345968 had a related patch set (by Paladox) published:
[mediawiki/core@master] Add primary key qc_type and qc_value

https://gerrit.wikimedia.org/r/345968

Reedy renamed this task from Give querycache a primary key to Add a primary key to querycache.Aug 29 2017, 9:45 PM
Reedy removed a subscriber: Tjlsangria.

We can't use qc_type, qc_value as a PK

mysql:wikiadmin@db1083 [enwiki]> select qc_type, qc_value, count(*) as cnt from querycache group by qc_type, qc_value HAVING cnt > 1 ORDER BY cnt DESC LIMIT 10;
+------------------------+----------+-------+
| qc_type                | qc_value | cnt   |
+------------------------+----------+-------+
|                        |        0 | 10000 |
| Unwatchedpages         |        0 |  5000 |
| Fewestrevisions        |        1 |  5000 |
| Listredirects          |        0 |  5000 |
| Lonelypages            |        0 |  4970 |
| Unusedtemplates        |        0 |  4602 |
| Uncategorizedtemplates |        0 |  3699 |
| Wantedtemplates        |        2 |  2744 |
| Wantedcategories       |        1 |  2245 |
| Wantedtemplates        |        3 |   996 |
+------------------------+----------+-------+
10 rows in set (0.06 sec)

Is this intended, or makes sense?

$ mysql -h dbstore2002.codfw.wmnet -P3311 enwiki -e "SELECT qc_type, qc_namespace, qc_title, count(*) as count FROM querycache qc1 JOIN querycache qc2 USING (qc_type, qc_namespace, qc_title) GROUP BY qc_type, qc_namespace, qc_title HAVING count(*) > 1"
...
|                 |            0 | "All-American_Boys"                                                           |   100 |
|                 |            0 | "All_My_Trials"                                                               |   100 |
|                 |            0 | "All_You_Zombies"                                                             |   100 |
|                 |            0 | "Allie"_Murray                                                                |   100 |
|                 |            0 | "Alon_Cohen"                                                                  |   100 |
|                 |            0 | "Alpha_Male''_Monty_Brown                                                     |   100 |
|                 |            0 | "Always"                                                                      |   100 |
...
| Disambiguations |            0 | IDB                                                                           |     9 |
| Disambiguations |            0 | Inflation_(disambiguation)                                                    |     4 |
| Disambiguations |            0 | Isostatic                                                                     |     4 |
| Disambiguations |            0 | Isotherm                                                                      |     4 |
| Disambiguations |            0 | Italian_dialects                                                              |     4 |
| Disambiguations |            0 | Ndebele_people                                                                |     4 |
| Disambiguations |            0 | Pythagorean                                                                   |     4 |
| Disambiguations |            0 | Red_Army_Choir                                                                |     4 |
| Disambiguations |          108 | Chemical_elements                                                             |     4 |
+-----------------+--------------+-------------------------------------------------------------------------------+-------+

... given the last col is the number of duplicates, not the value?

The qc_type being '' is definitely wrong, and I guess is an artefact of some historic bug, it doesn't seem to be on newer wikis. Filed T174513: Maintenance script to cleanup querycache to make MW clean this up, and then I'll look at sorting WMF production too (one way or another)

qc_value is rather arbitrary, per tables.sql Some sort of stored value. Sizes, counts... which isn't helpful

I'd presume that it makes no sense for any page to be listed under a specific qc_type multiple times

For example, for Disambiguations it's https://en.wikipedia.org/wiki/Special:DisambiguationPages - a page should only be listed once..

For some, it may make more sense, but I can't think of any at the moment...

Little bit more sane now

mysql:wikiadmin@db1055 [enwiki]> select distinct qc_type from querycache;
+-------------------------+
| qc_type                 |
+-------------------------+
| Ancientpages            |
| BrokenRedirects         |
| CrossNamespaceLinks     |
| Deadendpages            |
| DisambiguationPages     |
| Disambiguations         |
| DoubleRedirects         |
| Fewestrevisions         |
| GadgetUsage             |
| ListDuplicatedFiles     |
| Listredirects           |
| Lonelypages             |
| Longpages               |
| MediaStatistics         |
| Mostcategories          |
| Mostimages              |
| Mostinterwikis          |
| Mostlinked              |
| Mostlinkedcategories    |
| Mostlinkedtemplates     |
| Mostrevisions           |
| OrphanedTimedText       |
| Popularpages            |
| Shortpages              |
| Uncategorizedcategories |
| Uncategorizedpages      |
| Uncategorizedtemplates  |
| Unusedcategories        |
| Unusedimages            |
| Unusedtemplates         |
| Unwatchedpages          |
| Wantedcategories        |
| Wantedfiles             |
| Wantedpages             |
| Wantedtemplates         |
| Withoutinterwiki        |
| fr_unreviewedpages      |
| fr_unreviewedpages_q    |
+-------------------------+
38 rows in set (0.15 sec)
mysql:wikiadmin@db1055 [enwiki]> select qc_type, qc_value, count(*) as cnt from querycache group by qc_type, qc_value HAVING cnt > 1 ORDER BY cnt DESC LIMIT 10;
+------------------------+----------+------+
| qc_type                | qc_value | cnt  |
+------------------------+----------+------+
| Unwatchedpages         |        0 | 5000 |
| Listredirects          |        0 | 5000 |
| Fewestrevisions        |        1 | 5000 |
| Lonelypages            |        0 | 4970 |
| Unusedtemplates        |        0 | 4601 |
| Uncategorizedtemplates |        0 | 3694 |
| Wantedtemplates        |        2 | 2744 |
| Wantedcategories       |        1 | 2475 |
| Wantedtemplates        |        3 |  996 |
| Withoutinterwiki       |        0 |  854 |
+------------------------+----------+------+
10 rows in set (0.84 sec)

Is this intended, or makes sense?

$ mysql -h dbstore2002.codfw.wmnet -P3311 enwiki -e "SELECT qc_type, qc_namespace, qc_title, count(*) as count FROM querycache qc1 JOIN querycache qc2 USING (qc_type, qc_namespace, qc_title) GROUP BY qc_type, qc_namespace, qc_title HAVING count(*) > 1"
...
|                 |            0 | "All-American_Boys"                                                           |   100 |
|                 |            0 | "All_My_Trials"                                                               |   100 |
|                 |            0 | "All_You_Zombies"                                                             |   100 |
|                 |            0 | "Allie"_Murray                                                                |   100 |
|                 |            0 | "Alon_Cohen"                                                                  |   100 |
|                 |            0 | "Alpha_Male''_Monty_Brown                                                     |   100 |
|                 |            0 | "Always"                                                                      |   100 |
...
| Disambiguations |            0 | IDB                                                                           |     9 |
| Disambiguations |            0 | Inflation_(disambiguation)                                                    |     4 |
| Disambiguations |            0 | Isostatic                                                                     |     4 |
| Disambiguations |            0 | Isotherm                                                                      |     4 |
| Disambiguations |            0 | Italian_dialects                                                              |     4 |
| Disambiguations |            0 | Ndebele_people                                                                |     4 |
| Disambiguations |            0 | Pythagorean                                                                   |     4 |
| Disambiguations |            0 | Red_Army_Choir                                                                |     4 |
| Disambiguations |          108 | Chemical_elements                                                             |     4 |
+-----------------+--------------+-------------------------------------------------------------------------------+-------+

... given the last col is the number of duplicates, not the value?

I'm confused what this query is doing.. It's probably just tiredness... But why are you joining querycache against itself?

lang=sqlmysql:wikiadmin@db1055 [enwiki]> SELECT * FROM querycache qc1 JOIN querycache qc2 USING (qc_type, qc_namespace, qc_title) WHERE qc_title='Alexis_(disambiguation)';
+-----------------+--------------+-------------------------+----------+----------+
| qc_type         | qc_namespace | qc_title                | qc_value | qc_value |
+-----------------+--------------+-------------------------+----------+----------+
| Disambiguations |            0 | Alexis_(disambiguation) |     1601 |     1601 |
| Disambiguations |            0 | Alexis_(disambiguation) |     1601 |     1614 |
| Disambiguations |            0 | Alexis_(disambiguation) |     1601 |     1887 |
| Disambiguations |            0 | Alexis_(disambiguation) |     1614 |     1601 |
| Disambiguations |            0 | Alexis_(disambiguation) |     1614 |     1614 |
| Disambiguations |            0 | Alexis_(disambiguation) |     1614 |     1887 |
| Disambiguations |            0 | Alexis_(disambiguation) |     1887 |     1601 |
| Disambiguations |            0 | Alexis_(disambiguation) |     1887 |     1614 |
| Disambiguations |            0 | Alexis_(disambiguation) |     1887 |     1887 |
+-----------------+--------------+-------------------------+----------+----------+
9 rows in set (0.10 sec)

mysql:wikiadmin@db1055 [enwiki]> SELECT * FROM querycache WHERE qc_title='Alexis_(disambiguation)';
+-----------------+----------+--------------+-------------------------+
| qc_type         | qc_value | qc_namespace | qc_title                |
+-----------------+----------+--------------+-------------------------+
| Disambiguations |     1601 |            0 | Alexis_(disambiguation) |
| Disambiguations |     1614 |            0 | Alexis_(disambiguation) |
| Disambiguations |     1887 |            0 | Alexis_(disambiguation) |
+-----------------+----------+--------------+-------------------------+
3 rows in set (0.08 sec)
Reedy changed the task status from Open to Stalled.Sep 5 2017, 9:11 PM

I'm finding more crap to cleanup, see how things look when T174513 is done

Sorry, I do not know what I had in mind with the query (I think initialy I wanted to show different values of qc_value or something, but anyway, what I meant to show, at least right now, was:

root@neodymium:~$ mysql -h dbstore2002.codfw.wmnet -P3311 enwiki -e "SELECT qc_type, qc_namespace, qc_title, count(*), GROUP_CONCAT(qc_value) FROM querycache GROUP BY qc_type, qc_namespace, qc_title HAVING count(*) > 1"                                               +-----------------+--------------+-------------------------------------------+----------+------------------------+
| qc_type         | qc_namespace | qc_title                                  | count(*) | GROUP_CONCAT(qc_value) |
+-----------------+--------------+-------------------------------------------+----------+------------------------+
| Disambiguations |            0 | A_cappella_(disambiguation)               |        2 | 2411,7619              |
| Disambiguations |            0 | Abba_(disambiguation)                     |        2 | 880,2470               |
| Disambiguations |            0 | Alberta_(disambiguation)                  |        2 | 717,1504               |
| Disambiguations |            0 | Alexis_(disambiguation)                   |        3 | 1887,1614,1601         |
| Disambiguations |            0 | Algol_(disambiguation)                    |        2 | 1453,1394              |
| Disambiguations |            0 | Athens_(disambiguation)                   |        2 | 1184,1216              |
| Disambiguations |            0 | Atom_(disambiguation)                     |        2 | 1194,902               |
| Disambiguations |            0 | Babel_fish                                |        2 | 3818,3824              |
| Disambiguations |            0 | Baku_(disambiguation)                     |        2 | 4571,4566              |
| Disambiguations |            0 | Bee_(disambiguation)                      |        2 | 3710,4654              |
| Disambiguations |            0 | Beryl_(disambiguation)                    |        2 | 4074,4910              |
| Disambiguations |            0 | Brewing_(disambiguation)                  |        2 | 4410,8748              |
| Disambiguations |            0 | Citadel_(disambiguation)                  |        2 | 6695,7603              |
| Disambiguations |            0 | Citation_(disambiguation)                 |        2 | 7603,6201              |
| Disambiguations |            0 | Cloning_(disambiguation)                  |        2 | 6910,7800              |
| Disambiguations |            0 | Court_of_Criminal_Appeal_(disambiguation) |        2 | 5257,643               |
| Disambiguations |            0 | Critical_density                          |        2 | 5382,4116              |
| Disambiguations |            0 | DES                                       |        2 | 7927,5715              |
| Disambiguations |            0 | Datum                                     |        2 | 8495,6829              |
| Disambiguations |            0 | Daugava                                   |        2 | 3335,3536              |
| Disambiguations |            0 | Diaspora_(disambiguation)                 |        2 | 8613,8591              |
| Disambiguations |            0 | Drum_(disambiguation)                     |        2 | 9079,7950              |
| Disambiguations |            0 | Electron_tube                             |        2 | 6944,2392              |
| Disambiguations |            0 | England_(disambiguation)                  |        2 | 9322,9316              |
| Disambiguations |            0 | Entropy_(disambiguation)                  |        2 | 7363,9891              |
| Disambiguations |            0 | Europe_(disambiguation)                   |        2 | 9239,9240              |
| Disambiguations |            0 | Existence_(disambiguation)                |        2 | 9302,9593              |
| Disambiguations |            0 | IDB                                       |        3 | 6003,5451,3679         |
| Disambiguations |            0 | Inflation_(disambiguation)                |        2 | 5382,7958              |
| Disambiguations |            0 | Isostatic                                 |        2 | 6056,5236              |
| Disambiguations |            0 | Isotherm                                  |        2 | 4512,1418              |
| Disambiguations |            0 | Italian_dialects                          |        2 | 8128,7578              |
| Disambiguations |            0 | Ndebele_people                            |        2 | 3612,3464              |
| Disambiguations |            0 | Pythagorean                               |        2 | 1485,9553              |
| Disambiguations |            0 | Red_Army_Choir                            |        2 | 4567,8994              |
| Disambiguations |          108 | Chemical_elements                         |        2 | 5659,904               |
+-----------------+--------------+-------------------------------------------+----------+------------------------+

Haha. Glad it wasn't just me.

FWIW, the "Disambiguations" qc_type is going to be deleted. It (along with some others) is a value that's a few years old, and isn't used anymore. So I don't know if it's really representative of the rest of the tables contents

https://gerrit.wikimedia.org/r/#/c/376072 is WIP... :)

Krinkle renamed this task from Add a primary key to querycache to Add a primary key to querycache table.Jul 18 2019, 9:26 PM
Krinkle moved this task from To triage to Maintenance reports on the MediaWiki-Special-pages board.

If it's not feasible to weed out duplicates, possibly even a (qc_type,qc_value,qc_namespace,qc_title) PK could be an improvement. For MySQL at least, it'd avoid bug 76252, and reduce overall storage space usage when using InnoDB.[1]


[1] https://dev.mysql.com/doc/refman/5.7/en/innodb-index-types.html

Change 345968 abandoned by Paladox:

[mediawiki/core@master] Add primary key qc_type and qc_value

Reason:

https://gerrit.wikimedia.org/r/345968