Page MenuHomePhabricator

Validate android_breadcrumbs_event data
Closed, ResolvedPublic

Description

  • Verify data is visible in android_breadcrumbs_event for beta app users

Confirmed we are seeing data in android_breadcrumbs_event for beta app users. More in depth data quality analysis is in progress.

We will also be monitoring data in the coming days to ascertain if incoming data will potentially require sampling by device to limit data volume.

Event Timeline

mpopov triaged this task as Medium priority.Jun 21 2022, 5:08 PM
mpopov edited projects, added Product-Analytics (Kanban); removed Product-Analytics.
mpopov moved this task from Next 2 weeks to Blocked on the Product-Analytics (Kanban) board.
mpopov subscribed.

Blocked on availability of data from production release

Based on data volume estimates using Beta breadcrumb events per user and DAU and event counts from MobileWikiAppSessions (link) we anticipate events in android_breadcrumbs_event to be ~183% higher than what we see daily in MobileWikiAppSessions. I'm cc'ing @Ottomata as we discussed to give DataEng a heads up on our anticipated daily event data volume. We currently have this feature-flagged for Beta only, pending signoff (or a decision that we should indeed sample this data).

Okay so abouut 200-250 events per second? I think we should be okay. Note that @phuedx wants to disable sampling for edit attempt step soon, which IIRC will be about the same volume.

I think this will be fine, let's just keep an eye on things in the eventgate dashboard when this happens (e.g. let us know when it is enabled.)

But, the q we always gotta ask is: do you need to collect 100% of events? If you don't sampling is preferred! But if you do, then okayyyyyy :)

What about starting with 50% of installs? So the stream config would look like:

'android.breadcrumbs_event' => [
  'schema_title' => 'analytics/mobile_apps/android_breadcrumbs_event',
  'destination_event_service' => 'eventgate-analytics-external',
  'sample' => [
    'unit' => 'device',
    'rate' => 0.5,
  ],
],

@SNowick_WMF @Sharvaniharan @cooltey @Dbrant: By the way I would recommend splitting up the stream into a beta stream and a production release stream, so the sampling rates can be configured separately. Then change the instrumentation so the stream name is based on which release of the app it is (beta or production). The full stream config would be:

'android.breadcrumbs.beta' => [
  'schema_title' => 'analytics/mobile_apps/android_breadcrumbs_event',
  'destination_event_service' => 'eventgate-analytics-external',
],
'android.breadcrumbs.production' => [
  'schema_title' => 'analytics/mobile_apps/android_breadcrumbs_event',
  'destination_event_service' => 'eventgate-analytics-external',
  'sample' => [
    'unit' => 'device',
    'rate' => 0.5,
  ],
],

You can also use this split-stream pattern with other instruments, although this is the first one that really benefits from it (due to volume).

Thanks @Ottomata, we do not need all data so I'm fine with starting with 50% and following @mpopov's recommended settings. Will keep an eye on eventgate and post updates here when we're up and running.

Change 811765 had a related patch set uploaded (by Dbrant; author: Dbrant):

[operations/mediawiki-config@master] Add sampling to android.breadcrumbs event stream.

https://gerrit.wikimedia.org/r/811765

Change 811765 merged by jenkins-bot:

[operations/mediawiki-config@master] Add sampling to android.breadcrumbs event stream.

https://gerrit.wikimedia.org/r/811765

Mentioned in SAL (#wikimedia-operations) [2022-07-25T13:58:01Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:811765|Add sampling to android.breadcrumbs event stream. (T310847)]] (duration: 02m 56s)

Mentioned in SAL (#wikimedia-operations) [2022-07-25T13:59:27Z] <Lucas_WMDE> lucaswerkmeister-wmde@mw1320:~$ scap pull # T310847 (repeat failed host from earlier sync)

Mentioned in SAL (#wikimedia-operations) [2022-07-25T14:01:04Z] <Lucas_WMDE> lucaswerkmeister-wmde@mw1320:~$ sudo -i /usr/local/sbin/restart-php7.2-fpm # T310847 just in case