Page MenuHomePhabricator

Investigate differences in counts between legacy & modern-v1 Edit History Compare events
Closed, ResolvedPublic

Description

The very minor differences in counts across the different types of interactions with the Edit History Compare feature are a little weird:

actionn_total_legacy_eventsn_total_modern_eventspercent_difference (new-old)/old
show_history23692309-2%
thank_fail15150%
thank_success16160%
revision_view41834127-1%
compare27269-4%
thank_try57570%
compare1177174-1%

We need to investigate why the instrumentation (EditHistoryCompareFunnel.swift) produces fewer MEP events in some cases.


https://superset.wikimedia.org/superset/sqllab?savedQueryId=107

WITH user_event_counts_legacy AS (
    SELECT
      event.app_install_id AS app_install_id, event.action AS action, COUNT(1) AS n_legacy_events
    FROM event.mobilewikiappiosedithistorycompare
    WHERE year = 2020 AND month = 9 AND day >= 16
      AND useragent.os_family = 'iOS'
      AND useragent.wmf_app_version >= '6.7'
    GROUP BY event.app_install_id, event.action
), user_event_counts_modern AS (
  SELECT
    app_install_id, action, COUNT(1) AS n_modern_events
  FROM event.ios_edit_history_compare 
  WHERE year = 2020 AND month = 9 AND day >= 16
    AND user_agent_map['os_family'] = 'iOS'
    AND user_agent_map['wmf_app_version'] >= '6.7'
  GROUP BY app_install_id, action
), user_event_counts_joined AS (
  SELECT
    app_install_id, action, n_legacy_events, n_modern_events
  FROM user_event_counts_legacy AS legacy
  JOIN user_event_counts_modern AS modern
  USING (app_install_id, action)
)
SELECT
  action,
  SUM(n_legacy_events) AS n_total_legacy_events,
  SUM(n_modern_events) AS n_total_modern_events,
  100 * (SUM(n_modern_events) - SUM(n_legacy_events)) / SUM(n_legacy_events) AS percent_difference
FROM user_event_counts_joined
GROUP BY action;

Event Timeline

mpopov created this task.Sep 30 2020, 2:27 PM
Restricted Application added a project: Wikipedia-iOS-App-Backlog. · View Herald TranscriptSep 30 2020, 2:27 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@Tsevener: I have a suspicion that this is caused by not doing a try-catch when logging events using the old system. Since the counts are consistently either exact or fewer (never more), it leads me to think that in rare cases if a log() call sends an event but errors out for whatever reason, that will prevent newLog() from being called. So we get the event from the old system, but not from the new system.

Alternatively, it may be that in some cases the legacy system sends duplicates of events, so the modern-v1 counts are correct and it's the legacy counts which are inflated. @SNowick_WMF would you be able to check the duplication hypothesis?

Tsevener added a comment.EditedSep 30 2020, 3:36 PM

@mpopov could this also simply be because we haven't added back in the app-side persistence of events yet? The legacy system would catch more events because we immediately persist them, giving us a good fallback in case anything unexpected happens (app terminates, log method errors, etc.).

jlinehan added a comment.EditedSep 30 2020, 4:04 PM

@mpopov could this also simply be because we haven't added back in the app-side persistence of events yet? The legacy system would catch more events because we immediately persist them, giving us a good fallback in case anything unexpected happens (app terminates, log method errors, etc.).

Agree this should be one of the hypotheses. Can we examine the difference between server and client timestamps? That would allow us to get a sense for how long an event "dwelled" before being sent. This isn't diagnostic without additional information but would be food for thought. Also, is there anything that the "excess" events on the legacy system have in common? We should be able to identify them by comparing rows in each table with some kind of rough timestamp match.

SNowick_WMF added a comment.EditedOct 1 2020, 9:36 PM

Did some initial digging, will add more as I go.

@mpopov and @Tsevener I did not find any duplicate events in mobilewikiappiosedithistorycompare. There are events with duplicate event dt and session_id but they do not match on client_dt but I assume that is ok, and I found some of those in data for actions where we are not off counts (perhaps those are persisted sessions). They did not add up to the difference between counts so probably not our answer. I did get some similar results from the modern ios_edit_history_compare table as well.

@jlinehan I pulled the time differences in seconds between event dt and client_dt for each action that's off, the median time is 19 seconds. Breaking it down, 76% of time differences were under 30 seconds and 24% time over 30 seconds. There was one event in show_history that had a positive value of 37 seconds which seems odd but probably not relevant. data here.

I'll investigate the anomalous events next and add findings here.

LGoto triaged this task as Medium priority.Oct 5 2020, 6:52 PM
LGoto moved this task from Needs Triage to Product Backlog on the Wikipedia-iOS-App-Backlog board.
SNowick_WMF added a comment.EditedOct 9 2020, 10:12 PM

For the events where we have anomalies I am seeing counts off on both tables, there is still not a consistent pattern I am seeing ( it isn't just a matter of events missing on modern which would add points to the theory that this is related to event persistence) but I am preparing some analysis that I will go over with @Tsevener and @mpopov when it's ready. I also ran the validation query again, we still are seeing higher counts on legacy table.

actionn_total_legacy_eventsn_total_modern_eventscount differencepercent_difference
thank_fail151500%
thank_try595900%
thank_success161600%
compare11851805-2%
compare275723-4%
show_history2527245473-2%
revision_view4415437243.9%

... preparing some analysis that I will go over with @Tsevener and @mpopov when it's ready

@SNowick_WMF mind if I join in that?

... preparing some analysis that I will go over with @Tsevener and @mpopov when it's ready

@SNowick_WMF mind if I join in that?

Sorry @jlinehan of course! Did not leave you out intentionally :)

I made spreadsheets with events for 2 actions show_history and compare1 on both Legacy and Modern tables. First page of sheets is data from Modern and the most left column with dt from Legacy to compare events by initial timestamp. Rows with blank legacy dt field are events that only appear in Modern, rows with blank meta.dt field are events that only appear in Legacy.

show_history
compare1

We can meet to go over findings, so far I don't recognize any patterns but fresh eyes may see something I don't yet.

@SNowick_WMF Thanks for looking into this! Just curious - is meta.dt the equivalent to the client_dt set in the event on the client side in MEP?

I've done some digging on my end. I still think the lack of persistence explains cases where events exist in the legacy system but not MEP. For cases explaining the other direction (events that exist in MEP but not in legacy) I can think of a couple of things (note this is all from code review and not any actual testing yet):

  1. The legacy system first uses a wifi-only connection to try sending events, so if the user isn't on Wifi the events should hang around in the database for a while without successfully posting. If it's been at least a full day since an event was last successfully posted, then the session opens it up to attempting over LTE. EPC doesn't have this additional logic, so I think users on LTE will send events to MEP and not the legacy system for 1 day. I wonder if some of the events missing in the legacy system will trickle in in the next day or so.
  2. Both systems will totally drop the events if there's some sort of error that is NOT a network connection failure. I notice the logic for detecting a server-side error is slightly different. The legacy system confirms the error domain is of type NSURLErrorDomain before considering it a network connection error, whereas EPC does not consider that, therefore is more lenient in considering an error a networking error that will cause the event to retry. They also slightly vary in HTTP status code detection for determining a success which could also cause differences.
  3. Also along this line of thought, if one service has a hiccup in a way that causes one libraries to not consider it a networking failure (like a 500 status code or something), that could cause the event to totally be dropped in either system. Doesn't seem like either of these systems would be unstable to the point of that being noticeable in the counts like that but worth mentioning.

@Tsevener to clarify MEP meta.dt is equal to Legacy dt, MEP client_dt is equal to Legacy event.event_dt (on Android it's event.client_dt)

More research for all to look through. thank_try and compare1. The first sheet has results.

Instead of lining events up by meta.dt and dt I used app_install_id and event.app_install_id. What this showed for the 2 actions I examined were a few app_install_ids that were on Legacy but not on Modern and quite a bit more app_install_ids (and therefore, events) that were on Modern and not on Legacy. The event counts I'm getting show more events on Modern, which could just be more recent events. See also counts below.

actionn_total_legacy_eventsn_total_modern_eventscount differencepercent_difference
thank_try6275131.3%
compare118822133.45%

I pulled data for the missing app_install_ids for 'thank_try' and did not find any recognizable pattern (dang it!) like region, iOS version, app version, anon, language, etc. The mystery continues.

FWIW I did see a few concurrent events from the same user that had fewer events on Modern than Legacy so the persistence issue is is still at play here,

More findings, I checked the rest of the iOS data schemas for events and records for the 12 app_install_ids that show up on Modern and not Legacy for thank_try and found that only 4 of the 12 app_install_ids appear in other tables. I've added those ids/events to the spreadsheet but it looks like most of these users were not getting tracked on other tables and none of them show up in mobilewikiappdailystats

Not sure exactly how data sharing 'opt-in' users are checked or if that's on the client side but might be something the Modern schema bypasses? Just guessing at this point.

@Tsevener to clarify MEP meta.dt is equal to Legacy dt, MEP client_dt is equal to Legacy event.event_dt (on Android it's event.client_dt)

A couple of points:

  • meta.dt is the timestamp of the request's ingestion (EventGate is configured to fill in meta.dt when it is missing), in that sense it's equivalent to legacy EventLogging's dt, and the correct approach here is indeed not to match them up, because the two requests (1 sent via log() and 1 sent via newLog()) will not be received & ingested at the same time
  • client_dt is equal to Legacy event.event_dt in definition only, not in value. Since the two calls – log() and newLog() – fill in the timestamp when they are called and independent of each other, they cannot be guaranteed to match up for the same event

Not sure exactly how data sharing 'opt-in' users are checked or if that's on the client side but might be something the Modern schema bypasses? Just guessing at this point.

EPC-iOS uses the same mechanism as the legacy system – that is, both are checking for opt-in from the same place.

I've added those ids/events to the spreadsheet but it looks like most of these users were not getting tracked on other tables and none of them show up in mobilewikiappdailystats

Does MobileWikiAppDailyStats use a sampling rate on iOS?

SNowick_WMF added a comment.EditedOct 27 2020, 1:05 AM

Does MobileWikiAppDailyStats use a sampling rate on iOS?

@mpopov As far as I can tell it does not, but we definitely only get a sub-selection of actual dau.

mpopov closed this task as Resolved.Oct 28 2020, 8:36 PM

Thank you @SNowick_WMF for thoroughly looking into this! Let's call it done for now because there's not much else we'll be able to learn about the differences until the revision (v2) goes live with persistent storage (our prime suspect).

We will continue monitoring the counts and see how things look with v2.

Noting here since the question of allowable differences in data between Legacy and MEP tables has been raised. @mpopov and I discussed allowing a 2.5% variance in counts as passable for data between sets of tables. We arrived at 2.5 as reasonable because some variance is to be expected but anything above 2.5 may show where data collection is possibly due to bugs, based on experience and expectations.