Page MenuHomePhabricator

HomepageVisit schema validation errors
Closed, ResolvedPublic

Description

There is no data logged for HomepageVisit after 2020-10-26, the last received event has a timestamp of 2020-10-26T18:47:31Z. Checking logstash shows schema validation errors, specifically that start_email_state and start_tutorial_state are required properties.

From a count of the number of events in this schema, it looks like this issue started around the Variant C/D deployment on 2020-10-19, at which point the number of daily events dropped by almost 50%. The second drop coincides with the deployment of Variant D as the default (ref T265556#6579676). After that, no events are flowing in.

  1. Does the schema definition need updating based on the changes from Variant C & D?
  2. Are there changes needed to the logging code on the server side as well?

Event Timeline

I'm adding Analytics Engineering to this task, although there's nothing specific here for them to do. Instead, I'm wondering if the sudden drop in events, particularly the 100% decrease after October 26, and corresponding increases in schema errors is something they have automated systems to flag?

If not, perhaps I should do a weekly check of event volume for our schemas through Grafana? Or maybe something to add to @MMiller_WMF's Homepage reporting notebook?

@nettrom_WMF -- we'll talk about this task in our team meeting on Monday. Does this disturb your analysis for Variants C and D?

@nettrom_WMF -- we'll talk about this task in our team meeting on Monday. Does this disturb your analysis for Variants C and D?

It does not, we use HomepageModule as the basis for the Variant C/D analysis, so we don't need complex strategies to account for ad blockers and non-JS users. As far as I know, we don't have any ongoing analysis that relies on HomepageVisit (I'm fairly sure we'd catch this sooner if we did). Priority-wise, I think it would be good to have this fixed before January comes around, so we know we have complete data from then on.

We want to fix and backport this this week. @Tgr will take care of it.

We removed the tutorial module (along with the entire start module, which has been replaced by startemail) in T258008: Variant C/D: smaller start module so start_tutorial_state should probably be removed from the schema altogether. Same with start_userpage_state and start_startediting_state (which are optional fields). start_email_state should just be fixed to use the startemail module, unless it would be annoying for @nettrom_WMF if it didn't match the module name properly, in which case we should rename it to start_startemail_state.

I don't know how backporting works with the new git-hosted schemas and this is probably not the best time to find out, so let's just set the to-be-removed fields to a fake complete value for now.

Change 649481 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/GrowthExperiments@master] Update HomepageVisit logging

https://gerrit.wikimedia.org/r/649481

I don't know how backporting works with the new git-hosted schemas and this is probably not the best time to find out, so let's just set the to-be-removed fields to a fake complete value for now.

On second thought, backend schemas have not been migrated yet, and we only report HomepageVisit from the backend, so updating on meta should work.

...in which case we should rename it to start_startemail_state.

Oops, I mean startemail_state.

start_email_state should just be fixed to use the startemail module, unless it would be annoying for @nettrom_WMF if it didn't match the module name properly[…]

I'm for keeping the name, because as far as I can tell from T258008 it continues to work the same way with three states (noemail, unconfirmed, confirmed).

And as you discovered HomepageVisit isn't migrated to MEP yet because of T253121, so the canonical place to modify it is metawiki. All the proposed changes sound good to me!

I'm for keeping the name, because as far as I can tell from T258008 it continues to work the same way with three states (noemail, unconfirmed, confirmed).

Indeed, the behavior did not change.

So, the patch above will fix this issue. More generally, we could do one or both of two things:

  • include eventlogging errors related to our schemas in our Logstash dashboard to have more visibility for schema errors
  • alert about sudden drops in analytics data.

I'll add the latter to T268700: Investigate setting up alerts for Growth dashboards.

Change 649481 merged by jenkins-bot:
[mediawiki/extensions/GrowthExperiments@master] Update HomepageVisit logging

https://gerrit.wikimedia.org/r/649481

Checked Schema:HomepageVisit in betalabs - the events are being recorded; Schema:HomepageVisit seemed to be updated.

Couple of questions to @Tgr

Out of total 42 recorded events

@deployment-eventlog05:/srv/log/eventlogging$ grep  "HomepageVisit" client-side-events.log |wc
     42      42   39051

(1) Why some records have "clientValidated":false ?
25 records have "clientValidated":false and other - 17 - have true.

@deployment-eventlog05:/srv/log/eventlogging$ grep  "HomepageVisit" client-side-events.log |grep   "clientValidated%22%3Atrue%2C%22"|wc 
     17      17   16558

(2) Why some records with "clientValidated":false do not record start_email_state ?
Having "clientValidated":false correlates with the absence of start_email_state for many records

@deployment-eventlog05:/srv/log/eventlogging$ grep  "HomepageVisit" client-side-events.log |grep   "start_email_state" |wc
     21      21   20284

For example, the count for impact_module_state is the same as the total number of records for HomepageVisit.
I checked one of the user for whom start_email_state was missing. The Homepage was enabled for him when the account was created. My attempts to reproduce the issue were not successful. Also I checked if all email states have been recorded ("noemail", "unconfirmed", "confirmed") - they were recorded.

"?{"event":{"is_mobile":false,
"referer_route":"personaltoolslink",
"referer_namespace":-1,"referer_action":"view",

"user_id":48087,"user_editcount":0,
"impact_module_state":"unactivated","homepage_pageview_token":"5ffrn0gkk2od8485et8e7rhopuo250uf"},"schema":"HomepageVisit","revision":20021981,

"clientValidated":false,"wiki":"enwiki","webHost":"en.wikipedia.beta.wmflabs.org","userAgent":"Mozilla/5.0\u0020(Windows\u0020NT\u002010.0)\u0020AppleWebKit/537.36\u0020(KHTML,\u0020like\u0020Gecko)\u0020Chrome/64.0.3282.167\u0020Safari/537.36"};\tdeployment-cache-text06.deployment-prep.eqiad.wmflabs\t3697075\t2021-01-04T06:29:02\t172.16.4.119\t\"MediaWiki/1.36.0-alpha\""

No idea how that would happen. start_email_state is only omitted when the startemail module is not present, and that was only possible for variant A which doesn't exist anymore.

...unless you are looking at entries from before merging the patch, in which case this is the bug the task is about: some required fields weren't set.

No idea how that would happen. start_email_state is only omitted when the startemail module is not present, and that was only possible for variant A which doesn't exist anymore.

Yes, I thought it was smth like that.