Page MenuHomePhabricator

[SPIKE] Investigate metrics platform test inconsistencies in iOS
Closed, ResolvedPublicSpike

Description

Follow-up to T264193: Investigate differences in counts between legacy & modern-v1 Edit History Compare events. That task documented slight inconsistencies in the numbers of events sent by the same instrumentation to the legacy and modern event intake systems. In theory, the numbers should be identical, but the observed results showed a slight discrepancy in the numbers of events sent, with more events making it to the legacy than the modern system.

The very minor differences in counts across the different types of interactions with the Edit History Compare feature are a little weird:

actionn_total_legacy_eventsn_total_modern_eventspercent_difference (new-old)/old
show_history23692309-2%
thank_fail15150%
thank_success16160%
revision_view41834127-1%
compare27269-4%
thank_try57570%
compare1177174-1%

We need to investigate why the instrumentation (EditHistoryCompareFunnel.swift) produces fewer MEP events in some cases.

The leading hypothesis explaining this discrepancy is that the MEP client did not re-implement the unique storage model used by the iOS app for legacy eventlogging. For legacy eventlogging, unlike other platforms, the iOS app immediately stores events to a database table of pending events, then tries periodically to submit them for a period of up to 30 days. As part of T261987: MEP Client iOS (Revision), this pattern was reimplemented in the MEP client in order to test whether doing so would resolve the discrepancies.

Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptApr 23 2021, 7:57 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I slightly modified and re-ran Shay's query from T264193 for version 6.8, and these were the results. After the re-implementation of the storage model, we are now seeing slightly more events produced on the MEP side.

actionn_total_legacy_eventsn_total_modern_eventspercent_difference
revision_view526353712
thank_success770
show_history301930571
thank_try58580
compare294940
compare12062080
thank_fail18180

I am not sure how concerned we are about these very small discrepancies or what actions we intend to take as a result of this analysis. If I recall correctly, last time we (Jason and Toni and I) discussed the issue, we decided not to maintain the iOS-specific storage model in the iOS MEP client anyway.

WITH user_event_counts_legacy AS (
    SELECT
      event.app_install_id AS app_install_id, event.action AS action, COUNT(1) AS n_legacy_events
    FROM event.mobilewikiappiosedithistorycompare
    WHERE year = 2021
      AND useragent.os_family = 'iOS'
      AND useragent.wmf_app_version >= '6.8'
    GROUP BY event.app_install_id, event.action
), user_event_counts_modern AS (
  SELECT
    app_install_id, action, COUNT(1) AS n_modern_events
  FROM event.ios_edit_history_compare 
  WHERE year = 2021
    AND user_agent_map['os_family'] = 'iOS'
    AND user_agent_map['wmf_app_version'] >= '6.8'
  GROUP BY app_install_id, action
), user_event_counts_joined AS (
  SELECT
    app_install_id, action, n_legacy_events, n_modern_events
  FROM user_event_counts_legacy AS legacy
  JOIN user_event_counts_modern AS modern
  USING (app_install_id, action)
)
SELECT
  action,
  SUM(n_legacy_events) AS n_total_legacy_events,
  SUM(n_modern_events) AS n_total_modern_events,
  100 * (SUM(n_modern_events) - SUM(n_legacy_events)) / SUM(n_legacy_events) AS percent_difference
FROM user_event_counts_joined
GROUP BY action;

A number of the hypotheses I listed in T281001: [SPIKE] Investigate legacy vs. modern event submission inconsistencies for Android user contribution screen could also help to explain the loss of events on iOS prior to the reimplementation of the storage model:

  1. Events are sent to the old system immediately but enqueued for a period of time with the new client library. A small number of enqueued events that were submitted to the old system may be lost before submission to the new system when the user quits the app.
  2. It would not be unexpected for transient network or server errors to produce a small number of discrepancies on either side. (These should be approximately equally distributed between legacy vs. modern, assuming that the 5xx error rate for eventgate vs. the MediaWiki appservers is rougly equal).
  3. Users may be using ad blocking software that blocks requests to either or both endpoints. This depends on the ad blocking configuration used and is entirely out of our control.

Context/Back-Story Regarding the iOS-specific storage model change:
iOS app was very interested in pushing offline use cases generally and as a matter of global equity. There are a number of reasons you may use offline so wanted to make sure we saved events.

iOS sticks events into table and then retries for 30 days to send them. Initially planned to do this for Android as well but decided not to do this.

We decided to not do this anymore because we wanted to preserve user battery and bandwidth by not trying to constantly be trying to send analytics. Also improves the overall quality of the events data we are storing locally.

Conclusion
Since we removed iOS-specific storage model we can logically expect there to be a reduction in the number of events. Given this we should update the QA process to indicate that we can expect event counts to differ https://docs.google.com/spreadsheets/d/1ZpUEbGntW0NECyPsUijFgDwRXbczIOiTFwsGyN9yzL8/edit#gid=0.

Additionally (Product Analytics) could check to see if this impacts the overall insights taken away from the data. Look at a higher volume dataset like (session ticks) to analyze.