Page MenuHomePhabricator

User impact API: Create GrowthExperimentsUserImpactManager, GrowthExperimentsUserImpactLookup, and GrowthExperimentsUserImpactCompute services
Closed, ResolvedPublic

Description

This task is about creating three services in the GrowthExperiments extension.

Compute

The compute service, when given a user ID, should return an object containing:

  • number of thanks received for the user ID
  • Number of edits made by the user, split by namespace
  • number of edits with "newcomer task" tag (suggested edits)
  • date of last edit
  • current edit streak and longest historical edit streak
  • For each article that the user has edited:
  • [optional] Number of page views for duration period
  • [optional] data to populate sparkline showing the trends for page views for duration period
    • [optional] Call-to-action links, depending on whether we determine that the user should be able to edit the article again or if other types of structured/unstructured edits are possible.

The items marked [optional] would require an additional flag to the service in order to fetch. The idea is that some of this data is cheap and fast to obtain by querying replica database tables (thanks, edit count, date of last edit) while other items are slower, and should not be done on page load. We would invoke the service on page load to export data to the front-end via mw.config using only the cheap/fast to load data points, and then the impact module could call the API on the client-side to get the longer-to-compute data points.

For each data point, we should store a timestamp associated with the computation of the data point.

Extensibility: The service should provide a hook to allow for other extensions to modify the computation of the data points.

Lookup

This service is used for fetching existing data for the user.

Extensibility: The service should provide a hook to allow other extensions to modify the data.

Storage

The storage service will use a MySQL table as the backend in the first iteration, in a future iteration, it may use the Data Gateway (T310253).

Extensibility: The service should provide a hook to allow other extensions to modify data before it is stored.

Details

Show related patches Customize query in gerrit

Event Timeline

kostajh renamed this task from User impact API: Create GrowthExperimentsUserImpactStore and GrowthExperimentsUserImpactCompute services to User impact API: Create GrowthExperimentsUserImpactManager, GrowthExperimentsUserImpactLookup, and GrowthExperimentsUserImpactCompute services.Jul 20 2022, 10:41 AM
kostajh triaged this task as Medium priority.
kostajh updated the task description. (Show Details)
kostajh updated the task description. (Show Details)
kostajh updated the task description. (Show Details)

We might want to use ID batches as input for the compute service, otherwise the maintenance script might be very slow.

We should probably set some limits - which users to precompute for (to avoid forever accumulating more users; probably we should limit to users who made an edit in the last N days or such), and how many edits to look back (so the service doesn't explode when a user with a million edits looks at their homepage).

There should probably be a static lookup class and a dumb lookup class (which just calls the collector in realtime) for testing and development.

The task title mentions GrowthExperimentsUserImpactManager but the description talks about a storage class, is that the same?

Re: extensibility, do we plan we add hooks on the frontend side as well, so that more information can be fit in somewhere in the UI? Otherwise, I am not sure how useful it is to be able to store arbitrary other data.

We might want to use ID batches as input for the compute service, otherwise the maintenance script might be very slow.

We should probably set some limits - which users to precompute for (to avoid forever accumulating more users; probably we should limit to users who made an edit in the last N days or such), and how many edits to look back (so the service doesn't explode when a user with a million edits looks at their homepage).

Agreed, @KStoller-WMF and I were discussing this yesterday. Throwing some numbers out:

  • upper limit of 100 articles
  • upper limit of 1000 edits
  • upper limit of 60 days

And for the maintenance script, process data for users who meet the conditions:

  • have created an account within the last 60 days, or have edited within the last 60 days
  • are opted into Growth features (homepage is enabled)

There should probably be a static lookup class and a dumb lookup class (which just calls the collector in realtime) for testing and development.

That sounds like a good idea. It could also be useful to have a remote lookup class, so you could connect your local environment to stats on e.g. enwiki. Since the data we are working with is public, it would be nice if the API allowed you to pass the user ID for an arbitrary user. If we had those things, then it would be a lot easier to debug "What does the impact module look like for user X"

The task title mentions GrowthExperimentsUserImpactManager but the description talks about a storage class, is that the same?

Yeah, I think so. I was thinking about the UserOptionsManager/UserOptionsLookup services when writing this, fwiw; a similar split of responsibilities could work here, I think.

Re: extensibility, do we plan we add hooks on the frontend side as well, so that more information can be fit in somewhere in the UI? Otherwise, I am not sure how useful it is to be able to store arbitrary other data.

I think it was discussed earlier, and we could consider that. We should at least make a task for it. But I am not sure if it is something we would support in the first iteration of this module.

Data needed based on the latest desing in T313271#8177456:

  • number of (mainspace?) edits made by the user by day
  • time series (details being discussed in T220143) of total pageviews of all articles edited by the user
  • time series (details being discussed in T220141) of pageviews of a select few articles edited by the user (selection mechanism discussed in T220139)

Change 829326 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] User impact: Add DatabaseUserImpactStore

https://gerrit.wikimedia.org/r/829326

Change 829327 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] [WIP] Add ComputedUserImpactLookup

https://gerrit.wikimedia.org/r/829327

Change 829326 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] User impact: Add DatabaseUserImpactStore

https://gerrit.wikimedia.org/r/829326

Change 829327 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Add ComputedUserImpactLookup

https://gerrit.wikimedia.org/r/829327

Change 836211 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] UserImpactLookup: Switch to Computed as the default service

https://gerrit.wikimedia.org/r/836211

Change 836212 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] UserImpactHandler: Use correct flag for useLatest

https://gerrit.wikimedia.org/r/836212

Change 837219 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] User impact: Add standard user impact fallback chain

https://gerrit.wikimedia.org/r/837219

Change 843957 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] UserImpact: Add config flag to globally disable new impact module

https://gerrit.wikimedia.org/r/843957

Change 843963 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] labs: Enable GrowthExperiments new impact module

https://gerrit.wikimedia.org/r/843963

Change 843957 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] UserImpact: Add config flag to globally disable new impact module

https://gerrit.wikimedia.org/r/843957

Change 836211 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] UserImpactLookup: Switch to Computed as the default service

https://gerrit.wikimedia.org/r/836211

Change 843963 merged by jenkins-bot:

[operations/mediawiki-config@master] labs: Allow usage of GrowthExperiments NewImpact module

https://gerrit.wikimedia.org/r/843963

Urbanecm_WMF changed the task status from Open to In Progress.Oct 20 2022, 12:51 PM

Change 853299 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] [WIP] UserImpactHandler: Load database-backed user impact data

https://gerrit.wikimedia.org/r/853299

Change 853501 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] ImpactHooks: Refresh user impact after article edit

https://gerrit.wikimedia.org/r/853501

Change 853511 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Add UserRegistrationLookupHelper

https://gerrit.wikimedia.org/r/853511

Change 853946 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] ImpactHooks: Refresh user data after thanks received

https://gerrit.wikimedia.org/r/853946

Change 853949 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] ImpactHooks: Add helper method to restrict data refresh

https://gerrit.wikimedia.org/r/853949

Change 854972 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] ImpactHooks: Allow refresh for users who have edited in last 7 days

https://gerrit.wikimedia.org/r/854972

Change 836212 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] UserImpactHandler: Load from ExpensiveUserImpact by default

https://gerrit.wikimedia.org/r/836212

Change 853299 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] UserImpactHandler: Load database-backed user impact data

https://gerrit.wikimedia.org/r/853299

Change 853501 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] ImpactHooks: Refresh user impact after article edit

https://gerrit.wikimedia.org/r/853501

Change 853946 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] ImpactHooks: Refresh user data after thanks received

https://gerrit.wikimedia.org/r/853946

Change 853949 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] ImpactHooks: Add helper method to restrict data refresh

https://gerrit.wikimedia.org/r/853949

Change 854972 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] ImpactHooks: Allow refresh for users who have edited in last 7 days

https://gerrit.wikimedia.org/r/854972

Change 855971 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] [WIP] UserImpactHandler: Get data on demand

https://gerrit.wikimedia.org/r/855971

Change 855971 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] UserImpactHandler: Re-calculate data on demand

https://gerrit.wikimedia.org/r/855971

Change 858415 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[mediawiki/extensions/GrowthExperiments@master] ImpactHooks: $diff is a DateInterval, not an int

https://gerrit.wikimedia.org/r/858415

Change 858415 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] ImpactHooks: $diff is a DateInterval, not an int

https://gerrit.wikimedia.org/r/858415

Change 837219 abandoned by Gergő Tisza:

[mediawiki/extensions/GrowthExperiments@master] User impact: Add standard user impact fallback chain

Reason:

Not relevant anymore.

https://gerrit.wikimedia.org/r/837219

I'm marking this as resolved. It's not that helpful to QA due to the large number of patches and evolution of design/requirements.