On Special:EditWatchlist, users can currently only filter by namespace. It would be helpful if they could search by page title name, especially to help with removing pages or assigning labels. For users with large numbers of watched pages, it can currently be hard - especially due to pagination - to get to the page or group of pages that you're interested in taking actions on.
Description
Details
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| SpecialEditWatchlist: Add search | mediawiki/core | master | +113 -17 |
Related Objects
Event Timeline
Change #1238455 had a related patch set uploaded (by Samtar; author: Samtar):
[mediawiki/core@master] SpecialEditWatchlist: Add search
Just an initial run at this (currently just a title search) - should probably get some design input but went with:
Just noting that I'm working on this for pagination support (currently it only searches the page you are currently on)
Urgent suggestion, speaking as a Commons user with 37000 items on my watchlist, and intending to grow that further:
Allow users who are editing their watchlist labels, to filter by page names via a free text entry. In my case, I have a large number of categories on my watchlist that are named "<prefix> books <suffix>" and "<prefix> maps <suffix>". I would like to first filter them all, then checkmark them by the bulk and assign labels.
I wanted to endorse this request I have always had thousands of articles on my list, and it makes no sense to manually go through and do it. It would be particularly useful to have the search support the various special words in the search index -- (i.e. incategory) so that you could batch edit, based on existing metadata.
I don't think this is possible - the search index doesn't know what you're watching, and we can't join between it and the databases
The search index doesn't know, but we could add a filter on category titles couldn't we? Even if it's just a single category allowed (i.e. as a separate search field), that might still be useful.
The above patch looks good to me, I left a few comments. I think the case-insensitivity might need to be handled differently for mysql/sqlite/postgres. And is a prefixed wildcard going to be okay, performances-wise?
And is a prefixed wildcard going to be okay, performances-wise?
Think we'll need to talk to @Ladsgroup
You mean a filter on category titles in the query? Yeah we could, but I thought @Sadads was looking for a more general expansion of all the "special" words for searching to this page
Yes good point, I didn't take that in properly. But categories might be easier than say deepcat or hastemplate (although that latter might be possible, as well as linksto? And maybe even creationdate and lasteditdate?). But yeah, I think it might be simpler to implement these as separate fields rather than relying on CirrusSearch in some way (which presumably is impossible, as you say!).
For the title, should we just change it to a title prefix search for now, because we know that'll probably be fine?
@Ladsgroup: the query part that we're looking to add would be the WHERE wl_title LIKE '%lorem%' in the following (shown here for MySQL, but the other DBs would be similar), and we could drop the leading wildcard if that's a concern:
SELECT wl_namespace, wl_title, wl_notificationtimestamp, we_expiry, (SELECT GROUP_CONCAT(wlm_label SEPARATOR ',') FROM `watchlist_label_member` WHERE (wlm_item=wl_id) ) AS `wlm_label_summary` FROM `watchlist` LEFT JOIN `watchlist_expiry` ON ((wl_id = we_item)) WHERE wl_user = 2 AND ((we_expiry IS NULL OR we_expiry > '20260227064503')) AND (CONVERT(wl_title USING utf8mb4) LIKE '%lorem ipsum%' ESCAPE '`') AND wl_namespace IN (0,2,4,6,8,10,12,14,106,250,252,516,828,1198) ORDER BY wl_namespace,wl_title LIMIT 51
(Although perhaps there's a better process for getting DBAs' opinions on things!)
Hi, yeah. The leading '%' would be problematic and can cause issues plus it would be a very easy DDoS vector. Plus the convert function renders all index speed improvements useless and shouldn't be even needed.
Plus the convert function renders all index speed improvements useless and shouldn't be even needed.
@Ladsgroup: Thanks for looking at this! What is the correct way to get a case-insensitive search?
For sorting the labels, we're using CONVERT(wll_name USING utf8mb4) (the wll_name is a varbinary column the same as wl_title).
A search like SELECT * FROM watchlist WHERE wl_title LIKE 'blah%' is case-sensitive, but SELECT * FROM watchlist WHERE wl_title like 'blah%' COLLATE utf8mb4_general_ci is not.
(Actually, my comparisons with the watchlist_label table might be wrong because that has a default collation of utf8mb4_general_ci instead of binary. That might be a bug I'm not sure. But anyway, not relevant to this task.)
If you want case insesnstive search. You should use a search backend (opensearch, elastic, etc.). Beside that it'll be slow and it won't work there are many complexities. Let me give you an example. Lots of people don't know but uppercase I in Turkish and the lower case i are different letters: https://en.wikipedia.org/wiki/Turkish_alphabet so you'd need to build a different system for Turkish. I wouldn't be surprised if there are more languages with complexities like that.
For charset, we always use binary production. Anything that is not binary, should be converted to that.
Change #1238455 merged by jenkins-bot:
[mediawiki/core@master] SpecialEditWatchlist: Add search
