Page MenuHomePhabricator

Clean up GrowthExperiments-related user_properties rows
Open, LowPublic

Description

Background

A significant percentage of user_properties row is owned by GrowthExperiments. We should invest some effort into removing the rows we don't need, or consolidate the properties. We might benefit from conditional defaults, which we created for Echo as part of T353225: Echo: Make use of conditional user defaults.

This is a March 2022 analysis of the amount of user properties we consume:

In [23]: pivot_df.sort_values('Growth %', ascending=False)
Out[23]:
Variable   Growth  Growth_no_mentorship     Total  Growth %
wiki
arwiki    2436441               2237462  14499219     16.80
viwiki     778427                715775   5022918     15.50
kowiki     431262                391826   3449192     12.50
cswiki     297077                270141   2570822     11.56
frwiki    2043960               1899497  22215626      9.20
bnwiki     162257                156591   2107438      7.70

In [24]:

This is the list of properties occupying the most rows from that time:

mysql:research@dbstore1007.eqiad.wmnet [cswiki]> select up_property, count(*) from user_properties where up_property like 'growthexperiments-%' or up_property='welcomesurvey-responses' group by up_property having count(*) > 1500 order by count(*) desc;
+--------------------------------------------------------+----------+
| up_property                                            | count(*) |
+--------------------------------------------------------+----------+
| growthexperiments-help-panel-tog-help-panel            |    40789 |
| growthexperiments-homepage-enable                      |    35194 |
| growthexperiments-homepage-pt-link                     |    35035 |
| growthexperiments-tour-help-panel                      |    32918 |
| growthexperiments-tour-homepage-mentorship             |    32007 |
| growthexperiments-mentor-id                            |    26936 |
| growthexperiments-tour-homepage-discovery              |    20520 |
| growthexperiments-homepage-variant                     |    19715 |
| growthexperiments-tour-homepage-welcome                |    17258 |
| growthexperiments-homepage-suggestededits-activated    |    12661 |
| growthexperiments-homepage-suggestededits-preactivated |     7211 |
| growthexperiments-homepage-se-filters                  |     6098 |
| growthexperiments-homepage-se-ores-topic-filters       |     4030 |
| welcomesurvey-responses                                |     2793 |
| growthexperiments-homepage-tutorial-completed          |     2646 |
+--------------------------------------------------------+----------+
15 rows in set (0.107 sec)

mysql:research@dbstore1007.eqiad.wmnet [cswiki]>

We seem to be getting worse in this. For example, this is a October 2025 overview:

mysql:research@dbstore1007.eqiad.wmnet [cswiki]> select count(*) from user_properties;
+----------+
| count(*) |
+----------+
|  2989718 |
+----------+
1 row in set (1.832 sec)

mysql:research@dbstore1007.eqiad.wmnet [cswiki]> select count(*) from user_properties where up_property like 'growthexperiments-%';
+----------+
| count(*) |
+----------+
|   564289 |
+----------+
1 row in set (0.143 sec)

mysql:research@dbstore1007.eqiad.wmnet [cswiki]> select 564289 / 2989718 * 100;
+------------------------+
| 564289 / 2989718 * 100 |
+------------------------+
|                18.8743 |
+------------------------+
1 row in set (0.001 sec)

mysql:research@dbstore1007.eqiad.wmnet [cswiki]> 

mysql:research@dbstore1008.eqiad.wmnet [arwiki]> select count(*) from user_properties;
+----------+
| count(*) |
+----------+
| 17455549 |
+----------+
1 row in set (20.160 sec)

mysql:research@dbstore1008.eqiad.wmnet [arwiki]> select count(*) from user_properties where up_property like 'growthexperiments-%';
+----------+
| count(*) |
+----------+
|  4308029 |
+----------+
1 row in set (3.378 sec)

mysql:research@dbstore1008.eqiad.wmnet [arwiki]> select 4308029 / 17455549 * 100;
+--------------------------+
| 4308029 / 17455549 * 100 |
+--------------------------+
|                  24.6800 |
+--------------------------+
1 row in set (0.000 sec)

mysql:research@dbstore1008.eqiad.wmnet [arwiki]>

We went from ~11% of rows to ~19% of rows on cswiki and from ~17% to ~25% on arwiki.

Problem

As of now, this essentially prevents us from adding new user_properties row, as we are the most prominent user of them already.

Solution

Lower the usage of user properties below XX% in average across the pilot wikis. Exact number TBD.

Related Objects

Event Timeline

Restricted Application added subscribers: revi, Aklapper. · View Herald Transcript

T54777#7724456 would let us avoid storing preferences and A/B flags on account creation. For tours we could probably reduce storage size (especially once temp users are introduced) by making "not seen" the default value. Changing the default value will be cumbersome though.

Two ideas:

  1. Reset user properties to defaults after some period of time (180 days? 90 days?) for inactive users. We delete welcome survey responses after 90 days; it would be like that, but for other properties.
  2. Use cookies on account creation for tours instead of database backed user properties

Two ideas:

  1. Reset user properties to defaults after some period of time (180 days? 90 days?) for inactive users. We delete welcome survey responses after 90 days; it would be like that, but for other properties.

I'm not sure that's a good idea. I'm afraid this kinda negates the effect the positive reinforcement project will have on the users, as it will make contributing harder if an user returns back. It's also quite counter-intuitive (and unless the account vanishes in full, unexpected by users).

  1. Use cookies on account creation for tours instead of database backed user properties

This sounds like a good idea to me. We might be able to even use a long-expiry WAN cache – I guess it's quite unlikely users will see the tour for their first time in more than a month, so that might be enough.

I think this will roughly fall into three buckets:

  • A/B test placements and feature flags: if the core functionality mentioned in T54777#7724456 becomes available, these can probably be rewritten to be functionally equivalent but not store any preference data (other than users manually opting out of their A/B test placement).
  • tour "seen" flags, flags for the blue dot thingie: for more obscure tours, it might be fine to stick with the DB. For tours shown to a large fraction of new users (which is the case for most of our tours) we should be cautious about DB use. We could use cookies (disadvantages: users will see the tour again on a different device, inflates response size), local storage (disadvantages: users will see the tour again on a different device, tours must be fully client-side), use the DB with time-limits (ie. don't show the tour if the user is X days old, whether or not they have seen them), use the DB in some more compact way (bitflags?), maybe use global user preferences (less storage space needed, it's on x1 where space is cheaper, probably better UX). Or maybe once A/B / feature-flag preferences are fixed this is not such a big deal anymore.
  • genuine data storage, e.g. welcome survey or mentor information. Probably not a big deal in itself if the other issues are solved.
  • tour "seen" flags, flags for the blue dot thingie: for more obscure tours, it might be fine to stick with the DB. For tours shown to a large fraction of new users (which is the case for most of our tours) we should be cautious about DB use. We could use cookies (disadvantages: users will see the tour again on a different device, inflates response size), local storage (disadvantages: users will see the tour again on a different device, tours must be fully client-side), use the DB with time-limits (ie. don't show the tour if the user is X days old, whether or not they have seen them), use the DB in some more compact way (bitflags?), maybe use global user preferences (less storage space needed, it's on x1 where space is cheaper, probably better UX). Or maybe once A/B / feature-flag preferences are fixed this is not such a big deal anymore.

What about using the WANObjectCache for this? When a user registers, we'd create entries in the cache for each tour/navigation guide. As the user completes the tours, interacts with the blue dot, etc, we'd delete the cache item.

DMburugu lowered the priority of this task from Medium to Low.Jan 9 2023, 4:53 PM

Updating the list of "common offenders" for cswiki:

mysql:research@dbstore1007.eqiad.wmnet [cswiki]> select up_property, count(*) from user_properties where up_property like 'growthexperiments-%' or up_property='welcomesurvey-responses' group by up_property having count(*) > 1500 order by count(*) desc;

+-------------------------------------------------------------+----------+
| up_property                                                 | count(*) |
+-------------------------------------------------------------+----------+
| growthexperiments-help-panel-tog-help-panel                 |    85524 |
| growthexperiments-homepage-enable                           |    79966 |
| growthexperiments-homepage-pt-link                          |    79785 |
| growthexperiments-tour-help-panel                           |    77610 |
| growthexperiments-tour-homepage-mentorship                  |    75659 |
| growthexperiments-tour-homepage-discovery                   |    50460 |
| growthexperiments-tour-homepage-welcome                     |    34965 |
| growthexperiments-homepage-se-filters                       |    26997 |
| growthexperiments-homepage-suggestededits-activated         |    23116 |
| growthexperiments-homepage-se-ores-topic-filters            |    10807 |
| growthexperiments-homepage-suggestededits-preactivated      |     7210 |
| growthexperiments-homepage-suggestededits-guidance-blue-dot |     2803 |
| growthexperiments-homepage-tutorial-completed               |     2621 |
| growthexperiments-mentor-questions                          |     2480 |
| welcomesurvey-responses                                     |     1528 |
+-------------------------------------------------------------+----------+
15 rows in set (0.170 sec)

Checked the current state via T420369: Analyze the number of user properties used by the Growth team, here is a summary of what I found (full data are available).

  • The situation is progressively getting worse. In Mar 2022 (4 years ago), arwiki had ~17% of Growth user properties and cswiki ~12%. Today, arwiki has 32% of Growth properties and cswiki 26%, which is an increase by 15 percentage points, which is...alarming.
  • Most of the problem can be attributed to growthexperiments-homepage-enable, growthexperiments-homepage-pt-link and growthexperiments-help-panel-tog-help-panel, which we simply enable for everyone on signup (which we should...stop doing).
  • Given the speed the problem is growing, our current solution is clearly unsustainable, and we need to do something.
  • Many properties seems to be mergable (for example, do we really need users to be able to enable Homepage, Help panel and the navigation link separately? What about a singular "Enable newcomer features" user property? Help panel and Homepage are heavily integrating with each other already anyway...)

I believe we should at the very least, stop creating user properties in onLocalUserCreated. That would stop the problem from growing (or at least, from growing as rapidly as it does now). Other user properties can be dealt with on ad hoc basis (some of them can be even deleted relatively cheaply, such as growthexperiments-tour-*).

I think T406724 will take care of this too.

I wouldn't count on that. If we drop growthexperiments-homepage-enable, we would just disable the newcomer features for the users, which is probably not something we want to do. Also, the Growth features are not here for that long (dropping rows for users who didn't log in for 5 years would imact relatively few users, because Growth features weren't a standard yet five years ago).

I mostly meant it reduces the total number (we have dropped around 100M rows from enwiki) which also explains why the percentage has grown this much. But totally agreed that it should be fixed ASAP.

Hi, is there anything I can do to help moving this forward? Thank you!

Mh, I'm a bit hesitant about merging growthexperiments-homepage-enable, growthexperiments-homepage-pt-link and growthexperiments-help-panel-tog-help-panel into "newcomer features". While that would maybe have been a good idea to do initially, doing it retroactively might cause some backlash.

Deleting growthexperiments-tour-* for all accounts older than, say, 1 month makes sense to me.

I believe we should at the very least, stop creating user properties in onLocalUserCreated. That would stop the problem from growing (or at least, from growing as rapidly as it does now).

How would we then handle the purpose of the options that we are setting there? With a conditional config?

(In general, I hope for this problem to largely go away as Special:PersonalDashboard replaces Special:Homepage. But this likely going to take a good while longer.)