Page MenuHomePhabricator

Support conditional defaults for user properties
Closed, ResolvedPublic8 Estimated Story Points

Description

Description of the problem

See parent task T54777. In short, several extensions are setting user options from LocalUserCreated to provide users with different defaults based on their user type and time of registration (see more detailed list in Option 1). This has lead to accumulate millions of rows in the user_properties table and it will keep growing for every new registered user until we implement a solution for it.

Proposed solutions

Option 1
Splitting out @tstarling's proposal from T54777#7724456 into it's own task so it's easier to reference it:

I'm thinking about this in terms of the cost of deployment of IP masking. It's proposed to have a user and globaluser row for "temporary" accounts, and there will be a lot of temporary accounts. But probably most extensions will want to treat temporary accounts the same as anonymous users, leaving them with default preferences. Delivering welcome banners and the like is probably best done after the user explicitly creates an account.

So I'm not sure we really need this, but I still had better do a brain dump in case it's needed now or in the future.

Currently several extensions are setting user options from LocalUserCreated. Here's what they are trying to achieve:

  • The default preference value for new users may be a value different from the default for existing users.
  • The default for new users may change when a new version of the extension is deployed.
  • The default for new users may change when a configuration variable changes.
  • New users may be assigned to a random A/B test bucket and then receive different preferences depending on their bucket.
  • Possibly the default should be set based on global newness rather than local account autocreation.
  • The default for new users may later become the default for everyone, or vice versa.

My idea for efficiently achieving those requirements is to have extensions statically declare new user preferences. Have a new table which holds these declarations, say user_property_default:

CREATE TABLE user_property_default (
    upd_id INT UNSIGNED AUTO_INCREMENT NOT NULL,
    upd_property VARBINARY(255) NOT NULL,
    upd_user_type INT UNSIGNED NOT NULL,
    upd_min_user INT UNSIGNED NOT NULL,
    upd_min_bucket INT UNSIGNED NOT NULL,
    upd_value BLOB,
    PRIMARY KEY (upd_id),
    UNIQUE KEY (upd_property, upd_user_type, upd_min_user, upd_min_bucket)
);

To figure out a default user preference value for a given user, you search for a user_property_default row with a minimum user less than the given user_id:

SELECT upd_value FROM user_property_default 
WHERE upd_property='$prefname' 
   AND upd_user_type='$my_type'
   AND upd_min_user <= '$my_id'
   AND upd_min_bucket <= '$my_bucket'
ORDER BY upd_min_user DESC, upd_min_bucket DESC
LIMIT 1;

When a new user is created, the configured declaration is compared against the current (highest upd_min_user) value in the database for each preference. If the declaration has changed, a new row is inserted into the database with upd_min_user being the user_id of the user being created.

The user bucket would just be "hash(user_id) mod 1000" or something similar. Most user_property_default rows would have upd_min_bucket=0 and so would catch all buckets. If you insert a row with upd_min_bucket=990 then it will only take effect for 1% of users. There would always be a fallback with upd_min_bucket=0 so that the search doesn't continue back to previous upd_min_user values.

upd_user_type would be a small integer to allow the default to depend on autocreate flag and "temporary" status.

Sprinkle in some caching and stampede protection and I think it would mostly work. If the declared default changed depending on the request parameters, it would cause a bit of a mess, and preventing that would come down to code review.

(See the last few comments in T54777: user_properties table bloat for further discussion, including whether the table should be normalized, whether we could use configuration instead of DB rows, or whether the functionality could be simulated based on user ID hashes.)

Option 2
Splitting out @Urbanecm_WMF 's proposal from T321527#9182033:

Thought: Maybe it would make sense to instead have time-based defaults set in the configuration (via UserGetDefaultOptions, for example), and when the time is to disable A/B testing, simply make a code change to remove the registration-based defaults and be done? Granted, the timestamp of A/B testing start might not (and probably will not be) same in all installations, but even then, the default options can be managed from a MW config variable as well. I imagine this can live in extension.json, similar to how defaults are configured now. This can look like this:

{
     ...,
     "DefaultUserOptions": {
          "same-default-for-everyone": "default value for all users",
          "bucketed-defaults": "fallback default value"
     },
     "BucketedDefaultUserOptions": {
          "bucketed-defaults": {
               "20080101000000": "default value for users who registered after January 01, 2008",
               "20230101000000": "default value for users who registered after January 02, 2023"
          }
     }
}
Some potential use cases:
Is this request urgent or time sensitive?

Yes. This issue is in production and it will likely start causing DB malfunctioning in the wikis with the most registered users.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

One request is to have a user_properties name table. Don't store it as strings. i.e.

CREATE TABLE user_property_default (
    upd_id INT UNSIGNED AUTO_INCREMENT NOT NULL,
    upd_property_id INT UNSIGNED NOT NULL,
    upd_user_type INT UNSIGNED NOT NULL,
    upd_min_user INT UNSIGNED NOT NULL,
    upd_min_bucket INT UNSIGNED NOT NULL,
    upd_value BLOB,
    PRIMARY KEY (upd_id),
    UNIQUE KEY (upd_property, upd_user_type, upd_min_user, upd_min_bucket)
);

and

CREATE TABLE user_property_names (
    upn_id INT UNSIGNED AUTO_INCREMENT NOT NULL,
    upn_property_name VARBINARY(255) NOT NULL,
    PRIMARY KEY (upd_id),
    UNIQUE KEY (upn_property_name)
);

That way, we will make normalization of user_properties easier in the future.

That way, we will make normalization of user_properties easier in the future.

Is there a task about user_properties normalization?

I haven't seen one. Wanna make it?

Urbanecm_WMF updated Other Assignee, added: Urbanecm_WMF.
Urbanecm_WMF subscribed.

Setting assignees as discussed during today's meeting with Sergio.

As I'm thinking about this task, I'm not completely sure if the defaults table is the best approach. While the defaults table would handle A/B testing in progress easily (set the user bucket to a right number and you're done), and it does so in a row-effective way because of the bucketing, I'm not sure how would post-A/B testing handled. I believe handling post-A/B testing is somewhat more important, as A/B testing are temporary, but whatever is done post-test stays with us for a longer time.

In my experience, post A/B testing cleanup is often associated with mass changes of user properties to a different value. In Growth's case, growthexperiments-homepage-variant is changed from the treatment value (such as linkrecommendation) to the control value (always control) for all users. The cheapest way to do that with the defaults table is to change the defaults appropriately, but the practical solution a team member would probably use for that is calling the userOptions.php maint. script, which (assuming no other changes) would simply insert a bunch of rows into user_properties, contributing to its bloat.

Since tampering the database (and manually running UPDATEs / DELETEs on the defaults table) doesn't scale, is error prone and is not generally welcomed, what should we do instead? Should userOptions.php (or UserOptionsManager) somehow realize it is doing a mass change, and check whether it would make more sense to change the defaults rather than individual values? Is there an easy way to implement that? Should we have a different script for that, and rely on the deployer to run a proper one? Should we do something else?

Thought: Maybe it would make sense to instead have time-based defaults set in the configuration (via UserGetDefaultOptions, for example), and when the time is to disable A/B testing, simply make a code change to remove the registration-based defaults and be done? Granted, the timestamp of A/B testing start might not (and probably will not be) same in all installations, but even then, the default options can be managed from a MW config variable as well. I imagine this can live in extension.json, similar to how defaults are configured now. This can look like this:

{
     ...,
     "DefaultUserOptions": {
          "same-default-for-everyone": "default value for all users",
          "bucketed-defaults": "fallback default value"
     },
     "BucketedDefaultUserOptions": {
          "bucketed-defaults": {
               "20080101000000": "default value for users who registered after January 01, 2008",
               "20230101000000": "default value for users who registered after January 02, 2023"
          }
     }
}

I realize I'm saying this quite late in the process, and this is not meant to block the defaults table in any way (I think we can figure out a good way to clean up after A/B tests with it, too), but if anyone has thoughts, I'd be very happy to read them.

@Urbanecm_WMF I understand your point but I think if we make a maint script that can properly take care of the changes, it should be fine. As long as it's not inserting tens of millions of rows, it should be okay too.

Since tampering the database (and manually running UPDATEs / DELETEs on the defaults table) doesn't scale, is error prone and is not generally welcomed, what should we do instead? Should userOptions.php (or UserOptionsManager) somehow realize it is doing a mass change, and check whether it would make more sense to change the defaults rather than individual values? Is there an easy way to implement that? Should we have a different script for that, and rely on the deployer to run a proper one? Should we do something else?

I wouldn't be opposed to have a maintenance script that takes care of those operations although I understand your concerns with them being error prone or unconvetional compared to using userOptions.php to do a mass change. It would be indeed nice if userOptions.php can detect such a mass change and at the least throw some error that tells the developer to make use of the default options script instead if possible. @tstarling since you suggested the original proposal it would be great to hear your pov on this matter.

Thought: Maybe it would make sense to instead have time-based defaults set in the configuration (via UserGetDefaultOptions, for example), and when the time is to disable A/B testing, simply make a code change to remove the registration-based defaults and be done? Granted, the timestamp of A/B testing start might not (and probably will not be) same in all installations, but even then, the default options can be managed from a MW config variable as well. I imagine this can live in extension.json, similar to how defaults are configured now. This can look like this:

I like the approach for its simplicity but it seems it would only take care of time-based use cases, not user types.

I realize I'm saying this quite late in the process, and this is not meant to block the defaults table in any way (I think we can figure out a good way to clean up after A/B tests with it, too), but if anyone has thoughts, I'd be very happy to read them.

The time is just fine, we expect to move this initiative forward soon since the table growth is not sustainable for many wikis already. Discussing the best fit for a solution that will stick around for long is absolutely welcome.

@Urbanecm_WMF I understand your point but I think if we make a maint script that can properly take care of the changes, it should be fine. As long as it's not inserting tens of millions of rows, it should be okay too.

I understand the maint script could take care of the changes in Wikimedia environment, but core (and many extensions) are used by third parties as well. With an extension.json-based setup, such changes would take effect on 3rd parties automatically, similar to any other code update. With database-based setup, the maintenance script would need to be wired to update.php as well.

Our generic data migration support consists of DatabaseUpdater::addPostDatabaseUpdateMaintenance(), but that requires a full maintenance script to exist (its sole parameter seems to be the class name, subclassing LoggedUpdateMaintenance). Should we be adding an alternative data migration endpoint, that would support executing an existing script with given parameters once? Or should we expect developers to create a migration maint script any time they do such a change in defaults? That feels like a significant increase to me, but I might be wrong.

Our generic data migration support consists of DatabaseUpdater::addPostDatabaseUpdateMaintenance(), but that requires a full maintenance script to exist (its sole parameter seems to be the class name, subclassing LoggedUpdateMaintenance). Should we be adding an alternative data migration endpoint, that would support executing an existing script with given parameters once? Or should we expect developers to create a migration maint script any time they do such a change in defaults? That feels like a significant increase to me, but I might be wrong.

It is possible to trigger run of a generic maint script with arguments: I do this with links migrations https://github.com/wikimedia/mediawiki/blob/21351b7b24c768fbc929bfcec4cd1de81696c327/includes/installer/DatabaseUpdater.php#L1227

Option 3
Versioned properties. It is straight up forbidden to ever change the default value of an existing property. Well, it is, but this creates a new property with a new name. A naming scheme for this can be as simple as adding -v2 and so on to the end. All versions are documented in extension.json along with their defaults. The latest version replaces all previous versions.

Reading such a versioned setting is entirely unambiguous. We always know what it means when we don't find anything in the database: it means that the user never changed the default – and we still know every default.

Users that never touched the default will see a change when a version 2 with another default is introduced. E.g. a feature that was disabled by default but is now enabled by default will turn itself on for all users. I.e. the new default replaces the old one. But this is exactly how we want it to behave, isn't it?

Old entries in the database are either cleaned up on the fly whenever a user saves their settings, or regularly with a maintenance script. How two versions of a setting need to be converted is unambiguous and can be done any time.

One of the main differences to the proposals above is that this works without having to specify an awkwardly specific point in time. This worries me the most. Is that point in time supposed to be a deployment date? If so, don't we need three deployment dates for the three groups?

Bonus: Add a section to extension.json that allows us to list obsolete properties that have no corresponding code any more and should be stripped from the database. See T300371 for context.

Option 3
Versioned properties. It is straight up forbidden to ever change the default value of an existing property. Well, it is, but this creates a new property with a new name. A naming scheme for this can be as simple as adding -v2 and so on to the end. All versions are documented in extension.json along with their defaults. The latest version replaces all previous versions.

Reading such a versioned setting is entirely unambiguous. We always know what it means when we don't find anything in the database: it means that the user never changed the default – and we still know every default.

Thanks for the alternative option, Thiemo! While that understanding is correct, the main difficulty is determining whether the default was not changed because the user is not aware of the configuration setting, or because they like what the default does for them. Without saving user properties rows for all preferences (not just diffs from the default), and thus making T54777: user_properties table bloat worse rather than helping to resolve it, this is unfortunately impossible to tell precisely, and we have to operate on assumptions here, some of those assumptions I try to clarify in the rest of my comment.

Users that never touched the default will see a change when a version 2 with another default is introduced. E.g. a feature that was disabled by default but is now enabled by default will turn itself on for all users. I.e. the new default replaces the old one.

I might be missing something, but I don't understand how does this differ from the current behaviour. When you change the default of the foo property, the effective value would change for users who never changed their setting, and it would stay whatever they have in their setting. With the versioned setting, this would be a bit harder to calculate (you'd need to work with the versions somehow), but the end effect seems to be equal in both cases. In other words, I do not understand how this resolves the problem this task aims to resolve.

But this is exactly how we want it to behave, isn't it?

Not really. This is the desired behaviour for some changes, but in many other changes, we want change of defaults to only impact users who registered after the default was changed. This is done, because many times, we know current users didn't change the default simply because the current default fits their needs and we also know that newly registered users have different needs, which means two defaults are needed. There are several extensions that do this, such as Echo (see also T54777#9139837) or GrowthExperiments.

Taking all those extensions (features) together, a freshly registered Wikipedia user, who never ever visited Special:Preferences, has 21 user_properties rows (FWIW, 16 of them comes from Growth-maintained features, which is why the Growth team is working on this currently). Some of those rows are used as "seen flags" (and they eventually disappear, once the user fully onboarded), but most of them are simply overrides of defaults for newly registered users only.

A specific example: The username in the user menu (top-right) goes to your user page. This is expected (and relied on) by experienced users, but it is very, very unintuitive for newcomers. Research shows newcomers expect to see a "how do I get started" kind of page there, which is exactly what the Growth team displays there. But, deploying this kind of feature to experienced users would result in a huge community pushback, and understandably so, as the Homepage has little to none benefits for experienced users (those already know what they want to do on the wiki, and don't need the Growth team to give them tips).

To conclude: With both the current setup and the versioned properties setup you suggested, it wouldn't be possible to make changes for new users only in a way that doesn't add user_properties rows for every new user and thus, without creating the T54777: user_properties table bloat problem. Since solving T54777 is the goal of this task, I do not understand how would versioned properties help to resolve this.

One of the main differences to the proposals above is that this works without having to specify an awkwardly specific point in time. This worries me the most. Is that point in time supposed to be a deployment date?

It's meant to be a "switch date". The switch date would certainly be close to the deployment date, but doesn't need to be exactly the deployment date. As I mentioned above, this task seeks to provide a way for time-based user property defaults, with the goal of making changes that would be disliked by users that already "learned to like" how it behave previously, without forcing new users to "learn to like" an unintuitive feature/interface. I do not see a way how that would be possible without defining the time (be it via a registration TS, or via an user ID).

With option 2 (extension.json-provided reg. timestamps), we would need to account only for significant diversion of switch date and deployment date. For example, the Growth team deployed its features to vast majority of Wikipedias within a couple of weeks. Few Wikipedias had the features for a longer time (sometimes, years longer), but that was because Growth collaborated with those communities to pilot its features (and experiment on their newcomers). Since most things that are set via extension.json can be overriden in LocalSettings.php (or in WMF's case, operations/mediawiki-config), I do not think this is a problem.

If so, don't we need three deployment dates for the three groups?

I don't think so. In practice, it doesn't matter if the default doesn't change for users who registered during the last train. Those users are still newcomers, and they probably won't even notice the change. What matters is that it doesn't change for users who are here for years, because those users have already learned to like the current interface (otherwise, they probably wouldn't be editing :)). For that purpose, using an artificially specified switch date should be sufficient.

If per-wiki differences are needed in special cases, they can be taken care of within the configuration. For the original solution (the defaults table), the point of time would be determined per-wiki and automatically. The main difference between defaults table and extension.json is in the implementation difficulty, they otherwise work on similar assumptions.

Bonus: Add a section to extension.json that allows us to list obsolete properties that have no corresponding code any more and should be stripped from the database. See T300371 for context.

While I agree this bonus would be helpful to have, I don't think it would solve the problem this task aims to solve.

I hope that this comment helps to clarify the goals of the suggested change and the problems associated with it. If option 3 resolves some (or all) of those problems in a way I didn't understand, I'd be happy to learn more. Thank you!

Hi and thank you for your interest! Please check thoroughly https://www.mediawiki.org/wiki/New_Developers (and all of its communication section!). The page covers how to get started, assigning tasks, task status, how to find a codebase, how to create patches, where to ask general development questions and where to get help with setup problems, and how to ask good questions. Thanks a lot! :)

Old entries in the database are either cleaned up on the fly whenever a user saves their settings, or regularly with a maintenance script. How two versions of a setting need to be converted is unambiguous and can be done any time.

+1


My preference would be option 1. It's flexible and efficient, and – most importantly for me and the Data Products team – it can also managed via a UI. This hypothetical UI could:

  1. Allow all contributors (even community members) to view the current state of an experiment
  2. Allow all contributors to view the history of an experiment, e.g. when it was created, when it was enabled, etc.
  3. Allow all contributors to view historic A/B tests
  4. Allow non-technical contributors (e.g. Product Owners, Product Analysts) to enable or disable an A/B test without help from a technical contributor
  5. Allow non-technical contributors to document and add links to A/B tests

This something that the Data Products team is starting to investigate building (e.g. see T335058: Read views and T335063: Read view: One-click disable/enable). Being able to leverage a database in Core would undoubtedly make that easier.

Option 2 is appealing to me as well, as it fits better with the existing developer workflows around deploying changes. Option 1 would definitely require us to change some of our workflows, and as originally proposed it doesn't seem to have a lot of benefits – these would only come after we add the management interface, some logging or history, user permissions to change settings and edit documentation…

I want to note one thing: there's no reason why we can't do both. From the point of view of a developer dealing with MediaWiki preferences, all of this is should be abstracted away in some service that takes a user and returns all of their default settings. (We even already have the DefaultOptionsLookup service, although this one returns defaults that are independent of the user.) This service could look up the per-user defaults in the config, in the database, or indeed in both. So I'd start with option 2, and avoid doing anything that would make it impossible to introduce option 1 with the database access later, once someone volunteers to implement all of the required user interface bits.

(I feel like I don't quite understand the merits of option 3, but being unable to ever change the default value of an existing property doesn't seem like a good thing.)


Also, regardless of what is implemented, we should consider what happens when the user resets their preferences. Would it be confusing for two users to both reset their preferences and end up with different ones? Or would it be more confusing for a user to reset their preferences and end up with different ones than when they registered? It may be worth it to offer both options ("reset to defaults for this wiki" and "reset to my original preferences").

I'm also in favor of option 2 (I made a similar proposal in T54777#7860076):

  • Less state means it's easier to understand and easier to debug.
  • Option 1 would mean you can't tell what the default value is for different user groups, without consulting the DB across hundreds of wikis (or the git history of $wgDefaultUserOptions) which doesn't seem great.
  • IIUC option 1 doesn't let you change the default value for existing users (other than by changing user_property_default DB rows, potentially across hundreds of wikis). It would basically mean that $wgDefaultUserOptions stops mattering for existing users - if you change it, the change automagically gets recorded in the DB and only applied to users who have registered after that change was made. That's often useful, but other times we actually want to change the default for existing users (do an opt-out rollout for everyone, for example). We could write a maintenance script for that, sure, but it does seem unnecessarily complicated.
  • Since option 2 relies on the configuration variable system, you can provide defaults for new MediaWiki installs while option 1 doesn't easily support that (you could use the DB updater and migration scripts, sure, but again it's unnecessarily complicated). While most of the proposed use cases like A/B tests or gradual rollouts only make sense for a site administrator and not as a software default, different defaults for different user types (temp / primary wiki / not primary wiki) does make sense to set in the software.
  • Option 2 means you can "schedule" a rollout while with option 1 you have to make the configuration change in real-time (unless you manually mess with the user_property_default table). Not sure if that's a pro or a con. On one hand, with option 1 you can test the new configuration on mwdebug as you deploy it, with option 2 that doesn't seem possible. On the other hand, often rollouts are made on a pre-announced date and it would be convenient if we could just schedule them up ahead.

I don't think option 3 works - IIUC the code that needs to know the user's preference would have to be aware of the multiple preference versions, but changing the default is usually a site administrator concern, not a developer concern.


Neither option really describes how the non-userid/registration-date based parameters would be set. I think it's worth writing it out fully before making a decision. For option 2, should we have something like

$wgDynamicUserOptions['some-preference'] = [
    'temp' => 'value for temp users',
    '*' => [
        '*' => 'value for old users',
        '2023-07-05-' => [
            '*' => 'value for control group of new users',
            '950-' => 'value for 5% of new users',
         ],
    ],
];

? That's fully generic but quite unwieldy. Or would we have separate configuration options for new/old vs. A/B test buckets vs. user types? How would they interact? For option 1 that's even less clear.

Also, how would we add a new dimension (say different preference by user group, which would be useful for QA testing in production)? For option 1 that would require a schema change. For option 2 with the above array structure, it seems like a very painful breaking change. Alternatively, we could use a non-hierarchical notation like

$wgDynamicUserOptions['some-preference'] = [
    [
        'temp' => true,
        'value' => 'value for temp users',
    ],
    [
        'after' => '2023-07-05',
        'bucket' => 950,
        'value' => 'value for 5% of new users',
    ],
    [
        'after' => '2023-07-05',
        'bucket' => 0,
        'value' => 'value for control group of new users',
    ],
    [
        'value' => 'value for old users',
    ],
];

which is easy to extend but even more unwieldy.


So I'd start with option 2, and avoid doing anything that would make it impossible to introduce option 1 with the database access later, once someone volunteers to implement all of the required user interface bits.

I'd start with option 2, and also exclude A/B testing / buckets. It's easy to add later, it's not related to the problem in immediate need of solving (preference bloat due to different defaults for new users), and it's complex enough that it might be better to give it a dedicated mechanism (e.g. the one proposed in T242835: RFC: Standard method for feature-management in skins/extensions). That would also alleviate @phuedx's concern in T321527#9190432. Or at least delay it :)


Do we want to apply this mechanism to gradual rollouts that were done in the past (e.g. at some point in time it was decided that new users should not get "reverted" Echo notifications so Echo started adding that preference for new users - we could move that to dynamic configuration and delete a ton of user_properties rows)? With option 2 that's easy in an approximate way (where we'd identify a "close enough" date, and then for a few users who registered near that date, their preference would change). Option 1 could theoretically be more accurate but seems quite hard to use for that. Option 2 would not work for dates before user_registration existed (2007-ish?) but I don't think that's a concern in practice.


I think Fandom's opinion would be valuable here, given their very different scaling requirements (if they are relying on MediaWiki's default user options service at all). @TK-999 do you know who could provide feedback?

Are we stuck with ‘dynamic’ for the name ? What about wgConditionalUserOptions ?

$wgConditionalUserDefaults? A user should be able to opt out of experiments.

$wgConditionalUserDefaults? A user should be able to opt out of experiments.

I think that's a lot more descriptive. Dynamic is just one of those words that can mean anything and nothing.

$wgConditionalUserDefaults? A user should be able to opt out of experiments.

I think that's a lot more descriptive. Dynamic is just one of those words that can mean anything and nothing.

Not to bikeshed, but maybe "ComputedUserDefaults" to highlight that they're not free and people should use static defaults where possible?

Thanks for the CC.

Historically, we used additional conditions (e.g. an $user->getRegistration() range check) at places where the option was used when we needed such behavior. It seems odd to me to have this be core functionality, especially as it may be difficult to adapt, should there be a need for additional condition types. It seems more natural to have extension(s) encapsulate whatever logic they need to apply the effect of their user options conditionally.

[...]
Not to bikeshed, but maybe "ComputedUserDefaults" to highlight that they're not free and people should use static defaults where possible?

I don't think that really describes the purpose, as the defaults are not really computed (they merely differ in time, but otherwise are static). I think "use regular defaults whenever possible" should be highlighted in docs instead, and enforced by code review.

Not to bikeshed, but maybe "ComputedUserDefaults" to highlight that they're not free and people should use static defaults where possible?

With option 2 I don't think there's a nontrivial difference in performance, computed defaults just require a couple extra array lookup operations.

With option 1... that's a good point. Loading all preferences of a given user from user_properties is a single query, and very efficient (partial primary key lookup). Loading a single default for a given user from user_property_default would be efficient (index range lookup with a limit of 1). Loading all the defaults for a given user is not straightforward, though. I guess we could just drop the LIMIT and add a (upd_user_type, upd_min_user, upd_min_bucket) index and get all the buckets for all the properties for the given user type, and sort it out on the PHP side? Potentially a lot of rows though.

But then user preferences are cached so it's not something that would have to be done often.

Finding all users with a given preference value would also become hard (with both options), but I don't think we need that.

Historically, we used additional conditions (e.g. an $user->getRegistration() range check) at places where the option was used when we needed such behavior.

How would that work when the extension author and the site operator is not the same entity?

For example, Echo now hardcodes which preferences have different defaults for new and existing users. If you want different new user notification preferences on your site, you are out of luck.

We could have each extension define its overrides in its own way and apply it via the UserGetDefaultOptions hook. That would be functionally the same as option 2 here, but a single logic that applies to all preferences seems much more flexible and maintainable than each extension and each core preference reimplementing the same logic.

It seems odd to me to have this be core functionality, especially as it may be difficult to adapt, should there be a need for additional condition types.

There's always the question of how generic you want to be. I agree with @matmarex, DefaultOptionsLookup is already a service and can be overridden so no one is bound to use core's default preferences scheme, plus there's the hook for one-off things.

Should dynamic/computed preferences be implemented in such a way that other MediaWiki sites can add additional conditions that are not defined in core? With option 2 it seems doable, but feels YAGNI to me.

Neither option really describes how the non-userid/registration-date based parameters would be set. I think it's worth writing it out fully before making a decision. For option 2, should we have something like <snip />

We might also consider something like $wgAutopromote (a maps some value, a group name, to a set of conditions):

$wgConditionalUserOptions['some-preference'] = [
    [ 'value for old users' ],
    [ 'value for temp users', CUDCOND_TEMP ],
    [
        'value for 5% of new users',
        [ CUDCOND_AFTER, '2023-07-05' ],
        [ CUDCOND_BUCKET, 950 ],
    ],
    [
        'value for control group of new users',
        [ CUDCOND_AFTER, '2023-07-05' ],
        [ CUDCOND_BUCKET, 0 ],
    ],
];

This doesn't resolve any of your open questions though.


So I'd start with option 2, and avoid doing anything that would make it impossible to introduce option 1 with the database access later, once someone volunteers to implement all of the required user interface bits.

I'd start with option 2, and also exclude A/B testing / buckets. It's easy to add later, it's not related to the problem in immediate need of solving (preference bloat due to different defaults for new users), and it's complex enough that it might be better to give it a dedicated mechanism (e.g. the one proposed in T242835: RFC: Standard method for feature-management in skins/extensions). That would also alleviate @phuedx's concern in T321527#9190432. Or at least delay it :)

Agreed.

KStoller-WMF moved this task from Inbox to Current Maintenance Focus on the Growth-Team board.
KStoller-WMF moved this task from Current Maintenance Focus to Backlog on the Growth-Team board.
Sgs set the point value for this task to 8.Oct 31 2023, 5:12 PM
Sgs moved this task from Incoming to Ready on the Growth-Team (Sprint 2 (Growth Team)) board.
Urbanecm_WMF updated Other Assignee, added: Sgs; removed: Urbanecm_WMF.
Urbanecm_WMF added a subscriber: Sgs.

Change 978486 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/core@master] Move user options related classes into its own namespace

https://gerrit.wikimedia.org/r/978486

I started working on this today by reorganizing classes responsible for user options a bit (they were in MediaWiki\User, I moved them to MediaWiki\User\Options instead). I'm now working on a class to load conditional user defaults from configuration. After that, we'll also need to do this:

  • Update all other MediaWiki repositories to reflect the new names of user options classes (Amir's LSC will likely come handy)
  • Integrate ConditionalUserOptionsDefaultsLookup to DefaultOptionsLookup
  • Test the functionality
  • Make the configuration change for Echo, so that new rows stop being enough
  • Identify other extensions that would benefit from conditional defaults and make the configuration change for them as well

We should probably also delete the redundant rows once all of this is done, but that can be left for later now.

Change 978537 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/core@master] WIP: Add support for conditional user defaults

https://gerrit.wikimedia.org/r/978537

Change 978486 merged by jenkins-bot:

[mediawiki/core@master] Move user options related classes into its own namespace

https://gerrit.wikimedia.org/r/978486

Urbanecm_WMF renamed this task from Support dynamic defaults for user properties to Support conditional defaults for user properties.Dec 13 2023, 4:13 PM

I think it would be nice to document in a comment here the envisioned syntax for the configuration setting, in part so people can give feedback without having to read the patch, and in part because the patch only implements the one condition we have an immediate need for (registration date), but have use cases for other options (e.g. autocreated account vs. account on home wiki), and it's not entirely obvious how the syntax would generalize. Would it be what @phuedx suggested in T321527#9215021 (except without the "old default" value)?

I think it would be nice to document in a comment here the envisioned syntax for the configuration setting, in part so people can give feedback without having to read the patch, and in part because the patch only implements the one condition we have an immediate need for (registration date), but have use cases for other options (e.g. autocreated account vs. account on home wiki), and it's not entirely obvious how the syntax would generalize. Would it be what @phuedx suggested in T321527#9215021 (except without the "old default" value)?

Yes, that's what I settled on. For the record, here is a documentation of the $wgConditionalUserDefaults variable that I just added as a PHP docstring into the patch:

Map of user options to conditional defaults descriptors, which is an array
of conditional cases [ 'value when condition met', CONDITION ], where CONDITION is either:
(a) a CUDCOND_* constant (when condition does not take any arguments), or
(b) an array [ CUDCOND_*, argument1, argument1, ... ] (when chosen condition takes at
least one argument).

All conditions are evaluated in order. When no condition matches.
$wgDefaultUserOptions is used instead.

Example of valid configuration:
  $wgConditionalUserOptions['user-option'] = [
      [ 'registered in 2024', [ CUDCOND_AFTER, '20240101000000' ] ]
  ];

List of valid conditions:
  * CUDCOND_AFTER: user registered after given timestamp (args: string $timestamp)

Hope this clarifies. We should also write a mediawiki.org docpage for the newly introduced config variable, of course.

Change 978537 merged by jenkins-bot:

[mediawiki/core@master] Add support for conditional user defaults

https://gerrit.wikimedia.org/r/978537

MediaWiki Core now has support for conditional user defaults, which was the scope of this task => resolving the task. Rest of the work (such as actually using the new capability in Echo) is tracked under T354459 and subtasks.