Currently, we have problems with overly large watchlist and user_properties, and it's adding non-negligible cost to our maintenance. A rather low-hanging fruit could be to simply purge these rows if the users don't log in or any other action such as editing, saving preferences, watching a page, etc. (tracked via user_touched) after a certain time.
My proposal goes as follows:
- If the user hasn't logged-in (or any other action) for two years. Remove some user_properties rows such as rememberpassword, growthexperiments-tour-* (aka the tour has been seen and don't need to be repeated), etc.
- After five years, all user_properties rows should be deleted
- After ten years, their watchlist can be purged. We can send a reminder to the user via email one month before deletion of watchlist, but I don't think it's needed for user_properties purge.
This is a very common practice in the industry. Storage costs us real donor money and if the users have moved on, passed away, been banned, etc. we shouldn't be holding on to their settings or their watchlist forever.
For example. Even going with almost six years. We have this:
mysql:research@dbstore1008.eqiad.wmnet [enwiki]> select count(*) from user_properties; +-----------+ | count(*) | +-----------+ | 265352966 | +-----------+ 1 row in set (5 min 38.219 sec) mysql:research@dbstore1008.eqiad.wmnet [enwiki]> select count(*) from user_properties join user on user_id = up_user where user_touched < '20200000000000'; +-----------+ | count(*) | +-----------+ | 149216973 | +-----------+ 1 row in set (3 min 3.951 sec)
It's 56% of enwiki's user properties table.
See also: