Page MenuHomePhabricator

RFC: Expiring watch list entries
Closed, ResolvedPublic

Description

  • Affected components: MediaWiki core (Watchlists).
  • Engineer for initial implementation: Community Tech team (WMF).
  • Code steward: Community-Tech

Motivation

Background

This feature request features on both the German Technical wishlist from 2014 and the WMF Community Tech team wishlist at position #12 for 2015. A bug has existed for this feature since 2006 T8964.

WMF Community Wishlist proposal and votes: https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Watchlists#Watchlist_timed_expiry

Requirements

Expiring watchlist entries would be useful for the following reasons:

  • Watch a talk page of a user that you message for a response for a limited time.
  • Watch a page for a specified amount of time after a page protection expires.
  • Watch a page for a short amount of time after reverting vandalism on the page.
  • Watch a time boxed discussion page for the length of the timebox.

Exploration

(Use this space for data gathering, status quo, proposals, other considerations etc.)

Proposal

It should be possible to watch a page and have it be removed from your watchlist after a custom timeframe.
Users should initially be able to change the expiry time of a watchlist entry using the API.
Users should also be able to set a number of days for how long a page should be watched or make watchlist entries never expire on the Special:EditWatchlist page.

The initial implementation would not include any changes to watching a page directly from an article page (i.e. clicking the star button on the toolbar). All entries added to the watchlist by clicking the star button would still initially have an unlimited expiry date.

The feature would be first offered as a beta feature.

The initial proposal is to add a single new field to the watchlist table containing an expiry timestamp.
All selects on the watchlist table would be updated to only select watchlist entries that have not yet expired.
Users would be able to set an expiry when watching a page using action=watch specifying a parsable expiry (similar to protection expiry through the API)
The initial implementation would also have a maintenance script to remove expired entries

Backend Refactoring

The WatchedItem class currently contains methods such as doDuplicateEntries, duplicateEntries, removeWatch, addWatch, batchAddWatch, resetNotificationTimestamp & load all of which do not belong in this class. They should be moved to a WatchedItemStore or something similar.
Looking at usage of these methods if they were moved the only extension that would need updating would be Flow which uses the duplicateEntries, removeWatch & addWatch methods in production code and tests.

Methods could then be added to this store such as loadWatchedItemsForUser, loadUnwatchedItems and perhaps loadUsersWatchingPage which would remove the spread of SQL that would need to be touched by this proposal.
The SQL that would need to change is currently distributed through the following classes: InfoAction, ApiQueryInfo, ApiQueryUserInfo, ApiQueryWatchlist, ApiQueryWatchlistRaw, EmailNotification, SpecialEditWatchlist, SpecialRecentchanges, SpecialRecentchangeslinked, SpecialUnwatchedpages, SpecialWatchlist.

Considerations

  • It might make sense to refactor access to watchlist items before or potentially after trying to implement this with the goal of having a single location that makes calls to the watchlist table (currently queries are spread between multiple locations)
  • The expiry field of the watchlist table may need an index
  • As mentioned in T8964 to cover further expansion to the watchlist table a more general properties field may be preferable, although this would likely mean selecting non expired watchlist items would be harder
  • It might make sense to also allow adjusting the expiry date of watchlist entry in the raw watchlist editing mode (Special:EditWatchlist/raw). One possible way to do this would be expanding the raw watchlist format so that each row could also include a number of days to watch the page. Such a change should be backwards-compatible so that importing and exporting of already existing watchlist would not require any actions from the user.

Key Questions

  • Should the expiry date be its own field or should a properties field be introduced?
  • Is a more automatic way of keeping the watchlist table clean needed or will a maintenance script do?
  • Will indexes be needed on the expiry column in order for this to scale?
  • Should the refactoring talked about actually happen and if so should it happen before or after the changes?
  • Should custom expiry date be considered in the import/export process using Special:EditWatchlist/raw), or should all imported and exported watchlist entries always have unlimited watching duration?

See also

After initial implementation

  • Api modules that output info about watched items such as ApiQueryInfo, ApiQueryWatchlist & ApiQueryWatchlistRaw should display watchlist entry expiry times
  • Some sort of automatic removal of expired items may be desired

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I'm a bit worried that this effort has drifted significantly from the original use case. In order to expire watchlist items, all we need is a timestamp field and an expiration field. I'm sure that watchlist_props (or watchlist_tags) is a great idea that will solve lots of bugs and feature requests, but I don't think it has anything to do with this RFC, the original bug, or the Community Wishlist request. Why would the obvious and simple solution not be the correct one here?

I'm a bit worried that this effort has drifted significantly from the original use case. In order to expire watchlist items, all we need is a timestamp field and an expiration field. I'm sure that watchlist_props (or watchlist_tags) is a great idea that will solve lots of bugs and feature requests, but I don't think it has anything to do with this RFC, the original bug, or the Community Wishlist request. Why would the obvious and simple solution not be the correct one here?

I do see what you mean.
I think in the initial RFC discussion the general feeling was that users did not actually want to set an expiry date for individual watchlist items but instead just making it easier to remove old items and keep a watchlist in check.

After more thinking and discussion with people at the hackathon I think the long term route forward should roughly be this, and of course all comments welcome.

  1. wl_id (already in progress)
  2. Refactoring (already in progress)
  3. watchlist_props table as described in T124752#1998193 (which would allow us to store an expiry time stamp). The benefit of having this in a separate table is that not all watch list items would have an expiry.
  4. watchlist_tags table (I feel this is better than multiple watchlists but essentially allows you to group watched items into separate lists.
  5. wl_timestamp field on the watchlist table (for cases where people don't want to set an expiry and instead want to remove all items that have been in their watchlist for X weeks)
  1. wl_id (already in progress)
  2. Refactoring (already in progress)
  3. watchlist_props table as described in T124752#1998193 (which would allow us to store an expiry time stamp). The benefit of having this in a separate table is that not all watch list items would have an expiry.

See T129486. This is IMO not blocked on anything right now. So we could go on with that, right?

  1. watchlist_tags table (I feel this is better than multiple watchlists but essentially allows you to group watched items into separate lists.
  2. wl_timestamp field on the watchlist table (for cases where people don't want to set an expiry and instead want to remove all items that have been in their watchlist for X weeks)

See T125991.

  1. watchlist_props table as described in T124752#1998193 (which would allow us to store an expiry time stamp). The benefit of having this in a separate table is that not all watch list items would have an expiry.

See T129486. This is IMO not blocked on anything right now. So we could go on with that, right?

Indeed

@Tobi_WMDE_SW @Addshore: I think this has to by discussed with product people at the WMF. OIne major concern that was raised during the discussion is that "expiry" is a concept we only have for blocks and page protection. It's problematic to have something happen to a user's watchlist without notice - things just disappear with no trace.

Additionally, it's unclear when, where, and how expiry for individual watchlist items can be defined or edited. The "bulk edit" mode for watchlists is particularly problematic.

Basically, the technical side of making watchlist entries expire is not hard. But we don't have good user stories for some of the less obvious use cases. And UI and UX are largely undefined. It seems to me like we first need to re-iterate with product managers and the community, decide on stories and requirements, and then decide on the technical solution.

@daniel please see T100508#2014479 for a clear user story, simple UI and UX idea without the expiry issue you are worried about.

It seems to me like we first need to re-iterate with product managers and the community, decide on stories and requirements, and then decide on the technical solution.

Sounds great.

The user story at T100508#2014479 is just one specific case (and I don't believe it is the most common case). It's a good starting point, but we need more user stories and more UI ideas.

Here is the user story that I would personally like to see supported:
As a vandalism fixer, I would like to add a page to my watchlist for 1 week immediately after I have reverted vandalism to it in order to make sure a page is not re-vandalized. I do not, however, have any long-term interest in the page. I would like to be able to do this without leaving the article itself, i.e. without visiting my watchlist page or watchlist editing interface.

As a vandalism fixer, I would like to add a page to my watchlist for 1 week immediately after I have reverted vandalism to it in order to make sure a page is not re-vandalized. I do not, however, have any long-term interest in the page. I would like to be able to do this without leaving the article itself, i.e. without visiting my watchlist page or watchlist editing interface.

And with a ll of the other stories that have been requested throughout the discussion of this RFC all of the points in T124752#2209149 are needed.

  1. wl_id field - Being able to efficiently clear / maintain a watchlist
  2. watchlist_props table - Being able to expire an item after a given amount of time
  3. watchlist_tags table - Multiple watchlists / being able to tag items
  4. watchlist_timestamp field - Being able to remove really old items from a watchlist
  1. wl_id field - Being able to efficiently clear / maintain a watchlist
  2. watchlist_props table - Being able to expire an item after a given amount of time
  3. watchlist_tags table - Multiple watchlists / being able to tag items
  4. watchlist_timestamp field - Being able to remove really old items from a watchlist

"tags" don't exactly give you multiple watchlists, though. They just let you filter your one watchlist. Consider these multiple-watchlist stories, for example:

  • A user wants to put page Example onto their "Actively watch for repeat vandalism" list with expiry one week, "Check occasionally" with expiry two months, and "I should read this someday" with no expiry.
  • A user has had page Example on their "I should read this someday" list for years. Then they add it to their "New research project" list today, because it's relevant to the new project they're starting. They want the date-added timestamp to be correct for each list.

"tags" don't exactly give you multiple watchlists, though. They just let you filter your one watchlist. Consider these multiple-watchlist stories, for example:

  • A user wants to put page Example onto their "Actively watch for repeat vandalism" list with expiry one week, "Check occasionally" with expiry two months, and "I should read this someday" with no expiry.
  • A user has had page Example on their "I should read this someday" list for years. Then they add it to their "New research project" list today, because it's relevant to the new project they're starting. They want the date-added timestamp to be correct for each list.

Indeed, so:

  1. add a table containing list information (id, name, creation_date, etc.) and adding a list_id field to the watchlist table.

But as @kaldari said we are really getting away from the point of this RFC / the wish!

Yes, to clarify: this work is being done because of requests from the community, made in the 2014 German Community Wishlist and the 2015 WMF Community Wishlist. Here's the proposal from 2015, contributed by User:Derek Andrews:

"I would like to be able to set an expiry time for watchlist items, of say one week or one month. There are many pages that I do maintenance on or repair vandalism that I would like to watch for a brief period of time, but have no long term interest in. The UI I envisage would just have additional tick boxes: watch this page indefinitely, watch for one week; watch for one month."

The proposal got 18 endorsements and 55 support votes, making it the #12 most-supported wish in the Wishlist Survey. You can see the votes and enthusiastic comments here:

https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Watchlists#Watchlist_timed_expiry

So I'm puzzled by the idea that we need more/better user stories than the one already provided, discussed and overwhelmingly approved in both the German and WMF surveys.

Discussed @Addshore's proposed DB plan. @daniel brought up the point that we probably wouldn't need a watchlist_tags table since the watchlist_props table could also be used for tagging. This is similar to how the page_props table is commonly used for tagging with the value field just set to 1 or empty string and the propname field representing the tag.

I just realized the minutes from the IRC meeting are not on this ticket, but hidden in E138. Here they are now:

  • '''Expiring watch list entries | RFC meeting | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/''' (TimStarling, 22:01:04)
    • ''LINK:'' https://phabricator.wikimedia.org/T124752 (TimStarling, 22:01:11)
    • re: expiry date: addshore believes it's basically been answered, though a properties field might be useful further down the line (robla, 22:05:34)
    • question discussed: is this solving the problem at the right level of generality? (robla, 22:07:24)
    • : question discussed: what sort of database/maintenance overhead does this impose? will this require a maintenance script? (robla, 22:12:47)
    • addshore> So, the way the expiry is done in the patch is taken from the protection api currently (which also has expiries) (robla, 22:25:41)
    • question discussed: how quickly does expiry-based watchlist purging need to happen? does the feature need to rely on purging to work? (robla, 22:28:02)
    • questions discussed: is full watchlist cleanup automation required? would tags be helpful? do many people add all pages they edit to their watchlists? (robla, 22:44:32)
    • question: do we want an expiration date, or a watched-since timestamp? (DanielK_WMDE, 22:45:58)
    • <addshore> I guess with a combination of watched since and maybe a tag of number of days to expiry would actually work for most of the main cases for the expiry field (DanielK_WMDE, 22:54:01)

The full log can be found here: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-02-03-22.00.log.html

@DannyH @kaldari @Addshore @daniel @Bmueller I think we should add the notes and outcomes of our meeting from last Monday as well.

@Addshore The task description covers a lot, but is not very concise. This RFC seems essentially ready for another IRC discussion, or even a Lat Call without an IRC discussion, if the description was clarified a bit more. In particular, it should state concisely:

  • what exactly is proposed (schema change only? what fields? how will schema migration work?)
  • what drives the proposal (please refer to a ticket describing a user facing feature and shows that this feature has commitment from a team/product owner)
  • how the proposed solution would address the technical needs of the feature in question.

Are you interested in pushing this towards approval soon? Is the commitment to implementing this? If yes, by what team and in what time frame?

Addshore changed the task status from Open to Stalled.Jan 15 2018, 9:52 AM
Addshore claimed this task.

I'm going to mark this as stalled for now, and will come back to this in the next month.
I'll also assign myself so as not to forget.

Right now this isn't high up on the priorities of the WMDE tech wishes team, however we have seen similar requests for some sort of expiring watchlist entries on the wishlists this year.
Also priorities can change and teams can expand, and I see the value of having and RFC for this approved (even if no work will immediately be done) so that work could continue in the future, potentially by volunteers.

I'm now going to leave this as stalled but unassign myself as I am no longer really working on technical wishes.

kaldari changed the task status from Stalled to Open.Jan 13 2020, 10:15 PM

No longer stalled since wl_id has been implemented on WMF sites.

From a quick glance, this seems (in terms of technical requirements) very similar to the feature of expiring user rights, which was implemented not so long ago (see T12493).

The way that was implemented would be make sense to see considered here as well, as it would be consistent with something existing with known semantics in terms of behaviour, effort/complexity required for implemetation/maintenance etc.

In a nut shell:

  • Expiries are stored in a database table column.
  • At run-time, expired items are virtually non-existent by using a filtered query that always excludes these.
  • Asynchronously (purely for garbage collection purposes, not for functional requirements) a job is used to prune these from time to time. Generally by queueing the job after new entries are added (at most 1 behind) and (if there is persistence concerned under a low probability when the data is read.

I support matching the pattern that was used for usergroup expiry, which I think was mostly implemented by @TTO. For the schema changes, this would mean adding a varbinary column to the watchlist table to store an expiration timestamp and also a corresponding new index for that column. Maybe @TTO could elaborate on the rest of it (queries, garbage collection, etc.).

We're working on this as part of the wishlist items. We're going with the strategy of adding a new table that references the watchlist item and has an expiration.

See discussions https://phabricator.wikimedia.org/T235005#5714344 and the request/discussion about adding the table https://phabricator.wikimedia.org/T240094

To clarify -- the wl_id was already added to the table. When we checked into this feature, we believed this RFC to be done because the details (wl_id in the watchlist table) were done, even though the original proposers then deprioritized the work on the feature. When we consulted, we were told no RFC is needed for the actual feature.

Since wl_id was added to the watchlist, and since we anticipate most entries to not have any expiration (most rows will have null value) it makes more sense to add a new table instead.

We are in the process of clarifying our needs vs the operation of this table with DBAs. More information on the linked tickets, and in the project page.

I'm not sure why this ticket wasn't resolved when the column was added to the watchlist table.

I don't think we want to get into the work to add another column to the watchlist table.

The solution we are working toward is functionally what Krinkle suggest except we are doing with a second table rather than a new column.

There are potentially performance benefits to this pattern as well as productivity and maintenance benefits.

Rather than expanding an already large table with a column that will be null for the majority of records, we will merely join to the new table. To my mind, this is a much cleaner and easier solution all around.

@aezell, @Mooeypoo - Makes sense. The original implementation plan was to create a new table called watchlist_props, but without more clear use cases, that feels like it might be over-abstraction. This plan sounds like a good compromise between the two existing ideas.

I don't think we want to get into the work to add another column to the watchlist table.

The solution we are working toward is functionally what Krinkle suggest except we are doing with a second table rather than a new column.

There are potentially performance benefits to this pattern as well as productivity and maintenance benefits.

Rather than expanding an already large table with a column that will be null for the majority of records, we will merely join to the new table. To my mind, this is a much cleaner and easier solution all around.

+30 to all of that.

Krinkle renamed this task from [RFC] Expiring watch list entries to RFC: Expiring watch list entries.Apr 3 2020, 11:37 PM
Krinkle updated the task description. (Show Details)
Krinkle moved this task from Under discussion to P3: Explore on the TechCom-RFC board.

(Just realised I'm too late): In support of the tags idea which would also cleanly allow for multiple watchlists (T3492) with the method described in T182297. Tags would be quite a powerful concept, and expiry times could've been an 'action' based on tags.

I haven't been following along with the progress of the feature, but as I understand it lots of work has been done, should this RFC be closed now?

Looking at the activity on https://phabricator.wikimedia.org/tag/expiring-watchlist-items/ I'll ping @ARamirez_WMF and per edits on https://www.mediawiki.org/w/index.php?title=Help:Watchlist_expiry&action=history perhaps @ifried

kaldari claimed this task.

Yes, since this feature has already been implemented and deployed (although not to the bigger Wikipedias yet), I think we can close the RFC. See https://meta.wikimedia.org/wiki/Community_Tech/Watchlist_Expiry and Expiring-Watchlist-Items for further updates.

Krinkle moved this task from P3: Explore to P4: Tune on the TechCom-RFC board.

I'm glad the comments were helpful :) - I do note however that this never went on Last Call so its possible others may've been holding out or not yet gotten around to reviewing the chosen solution. I'll bring it up this week and see if it makes sense to perhaps sollicit at least some wider awareness and close it formally two weeks from now. Re-opening so that it's visible on the board this/next week.

On Last Call until Wed 25 Nov.

Krinkle edited projects, added TechCom-RFC (TechCom-RFC-Closed); removed TechCom-RFC.

No comments raised. I suppose it was good to have some exposure for it given the last-minute way it landed into core behind a feature flag. Any late arrivals, please file new tickets and follow regular development cadance to address any questions or concerns.

This RFC is approved with no concerns being raised during the last call period.