Page MenuHomePhabricator

[Analytics] Operationalize "UI editors"
Closed, DeclinedPublic

Description

Scope

We are about to operationalize some types of "UI editors" that we are interested in.

e.g.

  • a registered user
  • that made a minimum number of edits in the last 30/60/90 days
  • using the UI
  • in the Item namespace

Desired output

A table that we can use to quickly filter for all sorts of registered editors (e.g. min 1 edit in the last 60 days vs min 5 UI edits in the last 30 days).

rows: editors

columns:

  • year
  • month
  • day
  • user_id
  • number of edits
  • namespace
  • access type (for now: desktop UI, mobile UI, other [including all other edits from e.g. apps and API])
  • device type (if we can, mobile vs. desktop)

Notes

Urgency

When this task should be completed by. If this task is time sensitive then please make this clear. Please also provide the date when the output will be used if there is a specific meeting or event, for example.

DD.MM.YYYY


Information below this point is filled out by the Wikidata Analytics team.

General Planning

Information is filled out by the analytics product manager.

Assignee Planning

Information is filled out by the assignee of this task.

Estimation

Estimate:
Actual:

Sub Tasks

Full breakdown of the steps to complete this task:

  • subtask

Data to be used

See Analytics/Data_Lake for the breakdown of the data lake databases and tables.

The following tables will be referenced in this task:

  • link_to_table

Notes and Questions

Things that came up during the completion of this task, questions to be answered and follow up tasks:

  • Note

Event Timeline

Manuel renamed this task from [Analytics] Active UI editors to [Analytics] Operationalize "active UI editors".Oct 17 2023, 9:20 AM
Manuel updated the task description. (Show Details)
Manuel updated the task description. (Show Details)
Manuel moved this task from Incoming to To-Do on the Wikidata Analytics (Kanban) board.

@Manuel, do we want namespace to just be a column? Simplicity wise it might make sense to split all the metrics by namespace and then get aggregates via a group by on user id rather than saving data and the aggregate of data in the same table.

Something to also consider here @Manuel is what we like for a timestamp. We have aggregations here, so a granular timestamp would inherently go against the 30 day period. What were you thinking on that?

Manuel renamed this task from [Analytics] Operationalize "active UI editors" to [Analytics] Operationalize "UI editors".Oct 24 2023, 11:04 AM
Manuel updated the task description. (Show Details)
Manuel updated the task description. (Show Details)

I made some updates, to reflect hat we might need to look back longer than 30 days. This also addresses your comment, correct? Hope to be back for our Thursday work session!

Generally yes, @Manuel :) Working to get all of these columns figured out now 😊 The last thing I'm considering is how we're subsetting by namespace. We have conditions for UI edits that are inherently based on Wikidata related tags with the catch-all being an API edit classification. We generally have a good subsetting for mobile edits of both Wikidata and other namespaces, but not for desktop as the only condition we're using is containing wikidata-ui and not containing termbox. With that there will be a lot of desktop traffic for other wikis will be classified as API edits. Do we want to restrict this to just Wikidata?

Also as far as device_type is concerned, @Manuel, we do not have access to a user_agent field in wmf.mediawiki_history where we have the revision tags. As users can shift between devices for edits, it would be best to get this directly from the source of the interaction meaning that we'd likely need that coming from the edits table we're making schemas for in Figma. All the rest of the fields are ready though 😊

For namespace we're talking about User: and similar spaces for projects, discussions, etc etc :)