Page MenuHomePhabricator

Ingest data from PrefUpdate EventLogging schema into Druid
Closed, ResolvedPublic3 Estimated Story Points

Description

Ingest some of the data from the PrefUpdate schema into Druid, so that it can be viewed in Superset or Turnilo.

Per a review of the draft schema guidelines and previous cases such as T202751 this should be possible, with the following fields as dimensions:

  • property
  • isDefault

The property field has fairly high cardinality, with about 2000 different values as of last month. Otherwise the dataset would be very simple, with the sole measure being the number of actions (events), aggregated by (say) hour.

We'll also need the following standard event capsule fields, in order to monitor the rate of preference changes on particular projects or from particular kinds of clients:

  • Wiki
  • Browser Family
  • Browser Major
  • Browser Minor
  • Device Family
  • OS Family
  • OS Major
  • OS Minor

The following event fields should be left out:

  • version (apparently not in use anyway)
  • saveTimestamp
  • userId
  • value (not consistent between properties, probably large cardinality, partly redundant to isDefault for the use cases motivating this ticket)

A current use case would be for the web team to easily monitor the rate of opt-ins and opt-outs to the new (first deployment yesterday) Advanced Mobile Contributions mode, per project, see e.g T211197#4951773 ff. But there are many other use cases , considering the multitude of user preferences where the rate of users switching from/to the default state can be of interest.

Event Timeline

As mentioned in the task, since we just did the first deployment for AMC, it would be really helpful if we could monitor the opt-in rate as we proceed with community announcements and further deployments.

fdans triaged this task as High priority.
fdans added a project: Analytics-Kanban.
fdans moved this task from Incoming to Smart Tools for Better Data on the Analytics board.

Thanks @fdans - we'll also need at least some of the standard fields from the event capsule, as in the case of previous EventLogging ingestions (e.g. the aforementioned T202751, where these had been understood to be included without being listed explicitly in the task description. But I should have done that here for clarity, will do so now).

Acknowledged @Tbayer, the capsule fields are now loaded too in the test period.

Great - the capsule dimensions look good to me. It doesn't yet seem possible to switch to a daily time series, perhaps that is an artifact of the short test period? (The dataset seems to contain data from both Jan 14 and Jan 15. But splitting by time and selecting 1D granularity results in a chart consisting just of one dot, for Jan 15.)

Thanks! To me this looks good to go now, except perhaps that the x-axis coordinates seem a bit weird (each day appears twice - "Wed 16 Wed 16'", with the second "Wed 16" actually located at the start of Jan 17).

That link looks great overall. There seems to be a one-day discrepancy though between the dates given on the x-axis and in the mouseover. Also, I'm having trouble accessing this view (the chart never materializes, the spinner keeps spinning even after waiting for 5-10 minutes - tried both in Firefox and Chromium). Perhaps a general Turnilo issue?

What is the update frequency going to be - hourly?

What is the update frequency going to be - hourly?

Probably daily

Change 502462 had a related patch set uploaded (by Fdans; owner: Fdans):
[operations/puppet@production] Add PrefUpdate as event schema to ingest to druid

https://gerrit.wikimedia.org/r/502462

Change 502462 merged by Elukey:
[operations/puppet@production] Add PrefUpdate as event schema to ingest to druid

https://gerrit.wikimedia.org/r/502462

Nuria set the point value for this task to 3.