Page MenuHomePhabricator

Permalink redirecting: Support plain section links on wikis which haven't indexed their full history
Closed, ResolvedPublic

Description

In T349653 we added support for finding an archived thread by limiting our search to headings which have appeared in the current page's history. However in production it would take too long to back-fill our database with an index of all old revisions, so we have limited ourselves to current revisions. With this restriction we will need to perform the search in other ways.

Event Timeline

Change 994746 had a related patch set uploaded (by Esanders; author: Esanders):

[mediawiki/extensions/DiscussionTools@master] Add more search methods to findNewestRevisionsByHeading

https://gerrit.wikimedia.org/r/994746

Change 994746 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] Add more search methods to findNewestRevisionsByHeading

https://gerrit.wikimedia.org/r/994746

The code generates some ugly-looking SQL queries, so I'll post them here for review and future reference.

Although they look scary, I think they're good enough both in theory (queries are using indexes, no filesorts) and in practice (I ran them on the enwiki analytics replicas and they took ~0.1 s, even when returning no results, which should be the worst case).

But there are some inefficiencies – we query several tables multiple times to fetch the itr_parent__itid_itemid field which we don't use, and we also query them again in the big itid_itemid IN … subquery at the end which could be merged with the main query. It seems like the new SelectQueryBuilder makes it too easy to build up queries and has allowed us to catch a mild case of ORM-itis. We should try to avoid that, and hopefully it doesn't make the code generating these queries too messy with special cases.

I filed you some follow-up work :)

(The first query has already been like this even before the changes, and it's mostly my fault, so I think this doesn't need to block this patch.)

(By the way, I formatted the SQL code below with https://nene.github.io/prettier-sql-playground/, it's the best SQL code formatter I've found.)

1SELECT
2 itr_id,
3 it_itemname,
4 it_timestamp,
5 it_actor,
6 itid_itemid,
7 itr_parent_id,
8 itr_transcludedfrom,
9 itr_level,
10 itr_headinglevel,
11 itr_revision_id,
12 page_id,
13 page_namespace,
14 page_title,
15 page_is_redirect,
16 page_is_new,
17 page_touched,
18 page_links_updated,
19 page_latest,
20 page_len,
21 page_content_model,
22 page_lang,
23 actor_id,
24 actor_name,
25 actor_user,
26 itr_parent__itid_itemid
27FROM
28 `discussiontools_items`
29 JOIN `discussiontools_item_pages` ON (itp_items_id = it_id)
30 JOIN `discussiontools_item_revisions`
31 ON ((itr_items_id = it_id) AND (itr_revision_id = itp_newest_revision_id))
32 JOIN `discussiontools_item_ids` ON (itid_id = itr_itemid_id)
33 LEFT JOIN `page` ON (page_id = itr_transcludedfrom)
34 LEFT JOIN `actor` ON (actor_id = it_actor)
35 LEFT JOIN (
36 SELECT
37 itr_id AS `itr_parent__itr_id`,
38 itid_itemid AS `itr_parent__itid_itemid`
39 FROM
40 `discussiontools_items`
41 JOIN `discussiontools_item_pages` ON (itp_items_id = it_id)
42 JOIN `discussiontools_item_revisions`
43 ON (
44 (itr_items_id = it_id)
45 AND (itr_revision_id = itp_newest_revision_id)
46 )
47 JOIN `discussiontools_item_ids` ON (itid_id = itr_itemid_id)
48 ) AS `sqb1`
49 ON (itr_parent_id = itr_parent__itr_id)
50WHERE
51 (
52 itid_itemid IN (
53 SELECT itid_itemid
54 FROM
55 `discussiontools_items`
56 JOIN `discussiontools_item_pages` ON (itp_items_id = it_id)
57 JOIN `discussiontools_item_revisions`
58 ON (
59 (itr_items_id = it_id)
60 AND (itr_revision_id = itp_newest_revision_id)
61 )
62 JOIN `discussiontools_item_ids` ON (itid_id = itr_itemid_id)
63 JOIN `revision` ON (rev_id = itr_revision_id)
64 WHERE (itid_itemid LIKE 'h-B-%' ESCAPE '`') AND rev_page = 1940
65 )
66 )
67LIMIT 50;
68
69+------+-------------+--------------------------------+--------+------------------------------------------------------+---------------------------------+---------+----------------------------------------------------------------------------------------------------------+------+------------------------------+
70| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
71+------+-------------+--------------------------------+--------+------------------------------------------------------+---------------------------------+---------+----------------------------------------------------------------------------------------------------------+------+------------------------------+
72| 1 | PRIMARY | revision | ref | PRIMARY,rev_page_actor_timestamp,rev_page_timestamp | rev_page_timestamp | 4 | const | 153 | Using index; Start temporary |
73| 1 | PRIMARY | discussiontools_item_revisions | ref | itr_itemid_id_revision_id,itr_revision_id | itr_revision_id | 8 | enwiki.revision.rev_id | 7 | Using index condition |
74| 1 | PRIMARY | discussiontools_items | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.discussiontools_item_revisions.itr_items_id | 1 | Using where; Using index |
75| 1 | PRIMARY | discussiontools_item_pages | eq_ref | itp_items_id_page_id,itp_items_id_newest_revision_id | itp_items_id_newest_revision_id | 12 | enwiki.discussiontools_items.it_id,enwiki.discussiontools_item_revisions.itr_revision_id | 1 | Using index |
76| 1 | PRIMARY | discussiontools_item_ids | eq_ref | PRIMARY,itid_itemid | PRIMARY | 4 | enwiki.discussiontools_item_revisions.itr_itemid_id | 1 | Using where |
77| 1 | PRIMARY | discussiontools_item_ids | eq_ref | PRIMARY,itid_itemid | itid_itemid | 257 | enwiki.discussiontools_item_ids.itid_itemid | 1 | Using index; End temporary |
78| 1 | PRIMARY | discussiontools_item_revisions | ref | itr_itemid_id_revision_id,itr_revision_id | itr_itemid_id_revision_id | 4 | enwiki.discussiontools_item_ids.itid_id | 1 | Using index condition |
79| 1 | PRIMARY | discussiontools_items | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.discussiontools_item_revisions.itr_items_id | 1 | Using where |
80| 1 | PRIMARY | discussiontools_item_pages | eq_ref | itp_items_id_page_id,itp_items_id_newest_revision_id | itp_items_id_newest_revision_id | 12 | enwiki.discussiontools_item_revisions.itr_items_id,enwiki.discussiontools_item_revisions.itr_revision_id | 1 | Using where; Using index |
81| 1 | PRIMARY | page | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.discussiontools_item_revisions.itr_transcludedfrom | 1 | Using where |
82| 1 | PRIMARY | actor | eq_ref | PRIMARY | PRIMARY | 8 | enwiki.discussiontools_items.it_actor | 1 | Using where |
83| 1 | PRIMARY | discussiontools_item_revisions | eq_ref | PRIMARY,itr_itemid_id_revision_id,itr_revision_id | PRIMARY | 4 | enwiki.discussiontools_item_revisions.itr_parent_id | 1 | Using where |
84| 1 | PRIMARY | discussiontools_items | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.discussiontools_item_revisions.itr_items_id | 1 | Using where; Using index |
85| 1 | PRIMARY | discussiontools_item_pages | eq_ref | itp_items_id_page_id,itp_items_id_newest_revision_id | itp_items_id_newest_revision_id | 12 | enwiki.discussiontools_items.it_id,enwiki.discussiontools_item_revisions.itr_revision_id | 1 | Using index |
86| 1 | PRIMARY | discussiontools_item_ids | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.discussiontools_item_revisions.itr_itemid_id | 1 | Using where |
87+------+-------------+--------------------------------+--------+------------------------------------------------------+---------------------------------+---------+----------------------------------------------------------------------------------------------------------+------+------------------------------+
88
89
90SELECT
91 itr_id,
92 it_itemname,
93 it_timestamp,
94 it_actor,
95 itid_itemid,
96 itr_parent_id,
97 itr_transcludedfrom,
98 itr_level,
99 itr_headinglevel,
100 itr_revision_id,
101 page_id,
102 page_namespace,
103 page_title,
104 page_is_redirect,
105 page_is_new,
106 page_touched,
107 page_links_updated,
108 page_latest,
109 page_len,
110 page_content_model,
111 page_lang,
112 actor_id,
113 actor_name,
114 actor_user,
115 itr_parent__itid_itemid
116FROM
117 `discussiontools_items`
118 JOIN `discussiontools_item_pages` ON (itp_items_id = it_id)
119 JOIN `discussiontools_item_revisions`
120 ON ((itr_items_id = it_id) AND (itr_revision_id = itp_newest_revision_id))
121 JOIN `discussiontools_item_ids` ON (itid_id = itr_itemid_id)
122 LEFT JOIN `page` ON (page_id = itr_transcludedfrom)
123 LEFT JOIN `actor` ON (actor_id = it_actor)
124 LEFT JOIN (
125 SELECT
126 itr_id AS `itr_parent__itr_id`,
127 itid_itemid AS `itr_parent__itid_itemid`
128 FROM
129 `discussiontools_items`
130 JOIN `discussiontools_item_pages` ON (itp_items_id = it_id)
131 JOIN `discussiontools_item_revisions`
132 ON (
133 (itr_items_id = it_id)
134 AND (itr_revision_id = itp_newest_revision_id)
135 )
136 JOIN `discussiontools_item_ids` ON (itid_id = itr_itemid_id)
137 ) AS `sqb1`
138 ON (itr_parent_id = itr_parent__itr_id)
139WHERE
140 (
141 itid_itemid IN (
142 SELECT itid_itemid
143 FROM
144 `discussiontools_items`
145 JOIN `discussiontools_item_pages` ON (itp_items_id = it_id)
146 JOIN `discussiontools_item_revisions`
147 ON (
148 (itr_items_id = it_id)
149 AND (itr_revision_id = itp_newest_revision_id)
150 )
151 JOIN `discussiontools_item_ids` ON (itid_id = itr_itemid_id)
152 JOIN `page` ON (page_id = itp_page_id)
153 WHERE
154 (itid_itemid LIKE 'h-B-%' ESCAPE '`')
155 AND (page_title LIKE 'ThreadItemStore2/%' ESCAPE '`')
156 AND page_namespace = 1
157 )
158 )
159LIMIT 50;
160
161+------+-------------+--------------------------------+--------+------------------------------------------------------+---------------------------------+---------+------------------------------------------------------------------------------------------+------+--------------------------------------------------------------+
162| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
163+------+-------------+--------------------------------+--------+------------------------------------------------------+---------------------------------+---------+------------------------------------------------------------------------------------------+------+--------------------------------------------------------------+
164| 1 | PRIMARY | page | range | PRIMARY,page_name_title | page_name_title | 261 | NULL | 1 | Using where; Using index; Start temporary |
165| 1 | PRIMARY | discussiontools_item_ids | range | PRIMARY,itid_itemid | itid_itemid | 257 | NULL | 5296 | Using where; Using index; Using join buffer (flat, BNL join) |
166| 1 | PRIMARY | discussiontools_item_ids | eq_ref | PRIMARY,itid_itemid | itid_itemid | 257 | enwiki.discussiontools_item_ids.itid_itemid | 1 | Using index |
167| 1 | PRIMARY | discussiontools_item_revisions | ref | itr_itemid_id_revision_id,itr_revision_id | itr_itemid_id_revision_id | 4 | enwiki.discussiontools_item_ids.itid_id | 1 | Using index condition |
168| 1 | PRIMARY | discussiontools_items | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.discussiontools_item_revisions.itr_items_id | 1 | Using where; Using index |
169| 1 | PRIMARY | discussiontools_item_pages | eq_ref | itp_items_id_page_id,itp_items_id_newest_revision_id | itp_items_id_page_id | 12 | enwiki.discussiontools_items.it_id,enwiki.page.page_id | 1 | Using index condition; Using where; End temporary |
170| 1 | PRIMARY | discussiontools_item_revisions | ref | itr_itemid_id_revision_id,itr_revision_id | itr_itemid_id_revision_id | 4 | enwiki.discussiontools_item_ids.itid_id | 1 | Using index condition |
171| 1 | PRIMARY | discussiontools_items | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.discussiontools_item_revisions.itr_items_id | 1 | Using where |
172| 1 | PRIMARY | discussiontools_item_pages | eq_ref | itp_items_id_page_id,itp_items_id_newest_revision_id | itp_items_id_newest_revision_id | 12 | enwiki.discussiontools_items.it_id,enwiki.discussiontools_item_revisions.itr_revision_id | 1 | Using index |
173| 1 | PRIMARY | page | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.discussiontools_item_revisions.itr_transcludedfrom | 1 | Using where |
174| 1 | PRIMARY | actor | eq_ref | PRIMARY | PRIMARY | 8 | enwiki.discussiontools_items.it_actor | 1 | Using where |
175| 1 | PRIMARY | discussiontools_item_revisions | eq_ref | PRIMARY,itr_itemid_id_revision_id,itr_revision_id | PRIMARY | 4 | enwiki.discussiontools_item_revisions.itr_parent_id | 1 | Using where |
176| 1 | PRIMARY | discussiontools_items | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.discussiontools_item_revisions.itr_items_id | 1 | Using where; Using index |
177| 1 | PRIMARY | discussiontools_item_pages | eq_ref | itp_items_id_page_id,itp_items_id_newest_revision_id | itp_items_id_newest_revision_id | 12 | enwiki.discussiontools_items.it_id,enwiki.discussiontools_item_revisions.itr_revision_id | 1 | Using index |
178| 1 | PRIMARY | discussiontools_item_ids | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.discussiontools_item_revisions.itr_itemid_id | 1 | Using where |
179+------+-------------+--------------------------------+--------+------------------------------------------------------+---------------------------------+---------+------------------------------------------------------------------------------------------+------+--------------------------------------------------------------+
180
181
182SELECT itid_itemid AS `value`
183FROM
184 `discussiontools_items`
185 JOIN `discussiontools_item_pages` ON (itp_items_id = it_id)
186 JOIN `discussiontools_item_revisions`
187 ON ((itr_items_id = it_id) AND (itr_revision_id = itp_newest_revision_id))
188 JOIN `discussiontools_item_ids` ON (itid_id = itr_itemid_id)
189 JOIN `page` ON ((page_id = itp_page_id) AND (page_latest = itr_revision_id))
190WHERE (itid_itemid LIKE 'h-B-%' ESCAPE '`')
191LIMIT 2;
192
193+------+-------------+--------------------------------+--------+------------------------------------------------------+---------------------------------+---------+------------------------------------------------------------------------------------------+------+--------------------------+
194| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
195+------+-------------+--------------------------------+--------+------------------------------------------------------+---------------------------------+---------+------------------------------------------------------------------------------------------+------+--------------------------+
196| 1 | SIMPLE | discussiontools_item_ids | range | PRIMARY,itid_itemid | itid_itemid | 257 | NULL | 5296 | Using where; Using index |
197| 1 | SIMPLE | discussiontools_item_revisions | ref | itr_itemid_id_revision_id,itr_revision_id | itr_itemid_id_revision_id | 4 | enwiki.discussiontools_item_ids.itid_id | 1 | Using index condition |
198| 1 | SIMPLE | discussiontools_items | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.discussiontools_item_revisions.itr_items_id | 1 | Using where; Using index |
199| 1 | SIMPLE | discussiontools_item_pages | eq_ref | itp_items_id_page_id,itp_items_id_newest_revision_id | itp_items_id_newest_revision_id | 12 | enwiki.discussiontools_items.it_id,enwiki.discussiontools_item_revisions.itr_revision_id | 1 | |
200| 1 | SIMPLE | page | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.discussiontools_item_pages.itp_page_id | 1 | Using where |
201+------+-------------+--------------------------------+--------+------------------------------------------------------+---------------------------------+---------+------------------------------------------------------------------------------------------+------+--------------------------+
202

Ryasmeen moved this task from QA to Ready for Sign Off on the Editing-team (Kanban Board) board.
Ryasmeen edited projects, added Verified; removed Editing QA, Patch-For-Review.
Ryasmeen subscribed.

Tested in bn.wiki: was able to find archived section from section heading text.

Screenshot 2024-02-12 at 7.38.40 PM.png (430×2 px, 151 KB)