Page MenuHomePhabricator

📈[Epic] Source-of-truth logic for 'activity' in Wikibase
Closed, ResolvedPublic

Description

Background

In the context of our initiative to re-write metrics in a clean way from scratch, we should agree on a single source-of-truth logic of what 'activated' and 'active' Wikibase instance is, because many metrics rely on this.

Currently, there are several different ways 'activity' is measured in the ecosystem.
Most notable of them are: using the recent changes and using the revisions.

The recent changes are used to measure activity by Wikidata for their metrics (active users and active items).
The caveat of this approach is that the recent changes history is purged after 90 days, so there is no way to find out retrospectively whether an instance was 'active' at a certain moment in time in the past. For this reason, we also currently don't know when a certain instance was 'activated' according to this rule.

The revisions are not the same as recent changes but instead show the state of pages at different times in history. They do not include such things like new user creation, page moves and deletes, import, merge, etc. Looking at the times for revisions only shows us edits.
This logic is currently used for the 'edited in the last 90 days' metric on the Google Cloud dashboard, as well as for calculation of first and last edit in the conversions API call (most recently updated in T364991). This API call was previously used to track the instant drop off and abandonment metrics in a Google sheet.
The advantage of this approach is that the history of revisions is never purged and it is possible to re-create the values of the metrics retrospectively by looking at each instances first and last edit.

What should be considered activity in a Wikibase Cloud instance?

We want to use the definition of activity similar to the one in Wikidata: any change in the Wikibase that is recorded in the recent changes.
For Wikibase Cloud instances there is an important exception: the default actions our platform does when the instance is booted up should not count as activity (for example, creation of default users, default main page). Currently, the default main page is reflected in recent changes and makes all instances activated by default.

(We still want to keep tracking the 'edited in the last 90 days' metric based on revisions at least until end of year, since we've been reporting on it until now. This will not be considered the right way to recognize an 'active wiki' anymore).

Recognizing active instances

An instance on Cloud is considered active if it had any activity in the last 30 days (to align with the definition of active users).

Recognizing activated instances

Some of the metrics we want to track rely on whether an instance already activated (received its first activity) at a certain moment in the past.
However, we only have access to 90 days of recent changes. We need to work around this limitation for existing and all future wikis

Understanding time since last activity

For some metrics in the future, we might need to understand when the wiki had its last activity (for example, to understand time since it was abandoned).
However, we only have access to 90 days of recent changes. We need to work around this limitation for existing and all future wikis

Acceptance Criteria

  • The 'edited in the last 90 days' metric remains, it still relies on revisions. Creation of the default main page is not considered an edit. T372346
  • A new metric 'active in the last 30 days' is created on the Google Cloud dashboard which relies on recent changes. It also doesn't consider creation of the default main page as activity. This logic becomes the proof of concept and source of truth for the definition of 'activity' for all our future metrics. T372771
  • The correct data about first_ activity is available for all Wikibase instances - new and pre-existing. It gets updated properly. T372786
  • The correct data about last_activity is available for all Wikibase instances - new and pre-existing. It gets updated properly. https://phabricator.wikimedia.org/T372864

Event Timeline

Anton.Kokh updated the task description. (Show Details)
Anton.Kokh updated the task description. (Show Details)
Anton.Kokh renamed this task from Source-of-truth logic for 'active' Wikibase to 📈Source-of-truth logic for 'active' Wikibase.Aug 12 2024, 7:18 PM
Anton.Kokh renamed this task from 📈Source-of-truth logic for 'active' Wikibase to 📈[Epic] Source-of-truth logic for 'active' Wikibase.Aug 12 2024, 9:06 PM
Anton.Kokh renamed this task from 📈[Epic] Source-of-truth logic for 'active' Wikibase to 📈[Epic] Source-of-truth logic for 'activity' in Wikibase.Aug 19 2024, 1:37 PM
Anton.Kokh updated the task description. (Show Details)
Tarrow updated the task description. (Show Details)
Anton.Kokh claimed this task.

Closing quarterly goal