RFC: Proposal to add wl_addedtimestamp attribute to the watchlist table
Open, Needs TriagePublic

Description

Problem Statement

In most cases, users watchlist entries (on Special:EditWatchlist) can grow as large as >700 articles and figuring out the latest article added to watchlist (for example) is a difficult task (taking into consideration the user can't even remember the name of this particular article that was added), so, I was experimenting on building an extension (say WatchListFilter extension maybe?) to filter watchlist based on various parameters (like desc order of timestamp, asc order of timestamp, date added, etc) but by default, based on their timestamps added.

See also: T100508: Watchlist expiry: Watch pages for a specified time frame (2013), T208487: [RfC] Add CURRENT_TIMESTAMP support for `wl_notificationtimestamps` in watchlist table

Proposed Solution

MediaWiki already has a way of dealing with Timestamps, see https://www.mediawiki.org/wiki/Manual:Timestamp, which is very nice so a solution to handle this problem would be altering the watchlist table to add the wl_addedtimestamp attribute to it so it can be used in the filtering process. The attribute will store MW timestamp that will be used by MW or other extensions that may need this feature.

There would be a lot of filters that can help the users find the article added to watchlist without knowing the name of the article;

  • ASC order of TS
  • DESC order of TS
  • Range TS / Date search
  • Order by Alphabet (this can make use of other attributes in the watchlist table)
  • Filter by Regex
  • etc
MotivationOther thoughts
Trying to solve the finding watchlist entry problem with an extension I'm intending to work on "WatchListFilter".Whether the feature may be made available in core could be a different topic for discussion but the basic level usage here is for an extension. Also, other topics as per how the filter will look on the Special:EditWatchlist special page based on the OOUI-fication that has happened and others can be addressed on other tickets, maybe sub-tasks of this one I guess :)
Alternative solutions

I've not had deep investigations on an alternative way to solve this problem with other means different from this proposal. Looking at the current watchlist table, I don't yet see an alternative means although there may be a way I don't yet know but others do :).

As proposed here: T209773#4756088, that can be another approach in introducing this feature.

Side effects
  • More data going into the database which will increase the DB size over time, this feature is quite advantageous as other things can be built on it like the WatchListFilter extension etc.
  • After adding this attribute to the table (approach 1) or even approach 2 (as suggested by Brian), items already added to watchlist before the introduction to this feature will have no timestamp or default timestamp (which can cause some inaccuracy in the filtration process). So this is something worth noting how to deal with.
  • Another side effect (adv) can be that, this feature can act as a partial "watch this and read for later" as the user won't bother to bookmark the page or remember the name of the article. All the user needs to know is the approximate time the article was added to their watchlist and the filter can be used to figure the potential article.

Use cases (3 most possible scenarios)

(1) Getting the latest entry added watchlist: Assuming I added an entry to watchlist few days ago but have forgotten the exact name of the article but I've not added any other entry to my watchlist (so the one I added few days ago is the latest added), the current system can't tell which was the last entry added but with a feature like this, the user can just sort by/filter by descending order of time added and the first entry at the top of the list will be the one that was lastly added and then read it. So the user won't bother thinking or trying to figure out the name, the filter will help the user achieve that.

(2) Finding an entry in watchlist in order of alphabet: In another scenario, this kind of filter could be needed when the watchlist entry is added (at any time) but the user only knows the first letter that the article begins with. So the user can just filter the entries in ascending or descending order of alphabet or use a regex to get just all the articles in the tries that begin with "A" (for example) and then check the resulting list and see if the article will be found.

(3) Get the oldest watchlist entry: In some cases, users would want to cleanup their watchlist entries but may want to begin with the oldest added to watchlist but currently they can't know exactly which one is the oldest as there is way for MW to figure out that. So with a feature like this, there can be a filter to filter in ascending order of time added and then the latest at the top of the entry would be the oldest. If the user hasn't read it yet, he/she may read it and then remove it from watchlist and the process continue so very old watchlist entries are removed (some clean-up to watchlist approach). I would not want to cleanup/remove what I just added yesterday so a filter will help me do this.


Comments

An RfC was filed some weeks ago (as of today) which was quite wrong per the request, see https://phabricator.wikimedia.org/T208487, thanks to @daniel, @aaron, @Catrope, @Anomie for feedback that enlightened me to create this "real" RfC proposal. This RfC is a product of T208487.

D3r1ck01 created this task.Nov 17 2018, 9:37 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 17 2018, 9:37 PM
D3r1ck01 renamed this task from [RfC] Add wl_addedtimestamp attribute to the watchlist table to RFC: Add wl_addedtimestamp attribute to the watchlist table.Nov 17 2018, 9:38 PM
D3r1ck01 renamed this task from RFC: Add wl_addedtimestamp attribute to the watchlist table to RFC: Proposal to add wl_addedtimestamp attribute to the watchlist table.
D3r1ck01 updated the task description. (Show Details)Nov 17 2018, 11:53 PM
D3r1ck01 updated the task description. (Show Details)Nov 17 2018, 11:59 PM
D3r1ck01 moved this task from Inbox to Backlog on the TechCom-RFC board.Nov 18 2018, 12:01 AM

From what I understand (This is bit beyond my expertise), for things like watchlist where a large number of rows are potentially scanned, its important to keep the tables narrow, so that more of it fits into a page of memory. (Assuming that's true... Which i have no idea. This is beyond my db knowledge) perhaps it makes sense to instead of adding more fields to the watchlist table, instead have a watchlist_info table in a 1:1 relationship which can have more information field (also move wl_notificationtimestamp over there).

So we could have watchlist table with only wl_id, wl_user, wl_namespace, wl_title

And then we could have watchlist_info with wl_id, wli_notificationtimestampe, wli_addedtimestamp, whatever else. (maybe denormalizing wl_user into it for ApiQueryWatchlistRaw?). If we implement expiring watchlists, expiry time could go in this table too.

Which would then hopefully keep the common case of no filtering fast, and requiring a join for more complex filters.

The downside is it would be harder to make an effective index. Although i suppose if we denormalize wl_user into watchlist_info we could index on (wli_user, wli_addedtimestamp) and join into watchlist.

Just a thought. DBA's should correct me if this comment is complete non-sense.

Great input @Bawolff. That sounds great to me, and is alternative solution. I do see something with this approach. In the approach mentioned in the proposal, when a watchlist entry is removed (unwatched), everything goes away with just 1 query (pretty fast I guess). But introducing another table would bring 1 more query as watchlist and watchlist_info would have to be updated upon watchlist entry removal. This is to make sure we don't have entries in watchlist_info that has been removed from watchlist.

So per above, I'm wondering if we should go for 2 queries or 1 query depending on how they're implemented (I guess). But I'm pretty sure that the overall efficiency will be better if we got for the db normalization approach you mentioned :)

D3r1ck01 updated the task description. (Show Details)Nov 20 2018, 11:12 AM
kchapman moved this task from Backlog to Inbox on the TechCom-RFC board.Nov 20 2018, 5:13 PM
kchapman added a subscriber: kchapman.

Seems like this should be in the Inbox for TechCom rather than the Backlog.

D3r1ck01 added a comment.EditedNov 20 2018, 5:20 PM

Thank you very much @kchapman, and almost yes :) I was still supposed to develop the use-case section so the ticket can come into the Inbox in a full package. But if you think it's still fine to be there, no problem. I can still continue to beef it up while it's in "Inbox" :}

kchapman moved this task from Inbox to Backlog on the TechCom-RFC board.Nov 20 2018, 5:23 PM

@D3r1ck01 oops, misunderstanding on my side. Please move to the Inbox when you are ready.

D3r1ck01 added a comment.EditedNov 20 2018, 5:24 PM

@D3r1ck01 oops, misunderstanding on my side. Please move to the Inbox when you are ready.

A billion thanks :) and yes, I'll move it to Inbox when it's ready to go :) But I think I understand why you moved it to inbox, because discussions already started on the ticket. @Bawolff suggested I put some more use-cases, so it was already complete (to me) until Brian's suggestion, now it's incomplete :) But thanks!

I should emphasize, I really don't know if what I said makes any sense, and a DBA should probably weigh in before doing much based on my comment.

Does this has something to do with T125991: Add wl_timestamp to the watchlist table? Is that some sort of duplicate?

Does this has something to do with T125991: Add wl_timestamp to the watchlist table? Is that some sort of duplicate?

Yes, it looks like this and that are duplicates. I'll leave it to someone else to figure out which one to close as a duplicate of the other and what, if anything, to copy to the one remaining open.

Krinkle moved this task from Backlog to Schema on the MediaWiki-Database board.Nov 22 2018, 11:08 PM
D3r1ck01 updated the task description. (Show Details)Nov 26 2018, 11:02 AM
D3r1ck01 updated the task description. (Show Details)
D3r1ck01 moved this task from Backlog to Inbox on the TechCom-RFC board.
Joe added a subscriber: Joe.Nov 28 2018, 9:50 PM

Looking at the watchlist table, it includes a wl_id which at least on mysql is declared as an int(10) unsigned NOT NULL AUTO_INCREMENT.

So if all you need is to find the order of insertion of elements in the watchlists, the id itself should give you a coarse-grained version of that, and you won't need a schema change for that - or am I missing something?

Does this has something to do with T125991: Add wl_timestamp to the watchlist table? Is that some sort of duplicate?

I just talked to @Addshore about this. He reminded me that the reason we considered recording when an item was added to the watchlist was to propose an alternative to watchlist expiry: the timestamp would allow people to query for "old" entries on their watchlist, and just select and remove them all. Whether that is a viable solution would have to be determined when doing UX design for the watchlist expiry request.

@D3r1ck01 is there community support for this idea? Do many people find this useful? It seems pointless to discuss the technical merit of your proposal if we don't know whether it is actually needed, or if anyone commits to implementing it. For now, this seems like a technical solution for a problem of which we do not know if we actually have it. It seems to me, this should be proposed as a feature. The Community Wishlist Survey would have been perfect for this, but the submission period for this ended a couple of weeks ago https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2019. There is however a relate proposal there that has a lot of support: https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2019/Watchlists/Watchlist_item_expiration.

@D3r1ck01 is there community support for this idea? Do many people find this useful?

Great question @daniel. I think we should find out about this!

The Community Wishlist Survey would have been perfect for this, but the submission period for this ended a couple of weeks ago https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2019.

Yes I agree this would have been a good place to pitch this but it's rather unfortunate that submissions are over but could we look for a means (if possible) to send this to community and get feedback? I don't know if a channel we can use but maybe others have ideas?

There is however a relate proposal there that has a lot of support: https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2019/Watchlists/Watchlist_item_expiration.

True!

To further clarify on the Blocked external/Not db team by Manuel, this seems a fairly simple and strightforward [famous last words], and not worrying storage-wise, you don't need any previous discussion with us to work on it- if agreed. We would like, however, to review potential new queries on implementation, to make sure indexing is used appropriately (it most likely will need a new index to filter on it, and that may not be that simple except on trivial usages- e.g. T209773#4783873 will need thorough review to support large watchlists). We are obviously always open for questions (ping us)- but we are not leading this work.

daniel moved this task from Inbox to Backlog on the TechCom-RFC board.Nov 30 2018, 6:29 PM

@D3r1ck01 wrote

I don't know if a channel we can use but maybe others have ideas?

There is no single channel to the community, since there is no single community. You could start with the English language Wikipedia, perhaps on https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(proposals) or on the mailing list https://en.wikipedia.org/wiki/Wikipedia:Mailing_lists#Public_mailing_lists.

I'm moving this to the RFC backlog for now, since without commitment to resourcing, discussing the technical details seems pointless.