Page MenuHomePhabricator

[BUG] Logging error of MobileWikiAppDailyStats for the iOS app
Closed, ResolvedPublicBUG REPORT

Description

The current iOS app (version 6.2.3.1612) is sending event to the MobileWikiAppDailyStats schema (revision 17984412). Example event json:

{"revision":17984412,"event":{"ts":"2019-06-20T14:53:23-07:00","is_anon":true,"appInstallAgeDays":0,"appInstallID":"5A9CAA64-4A76-4E1D-ACA6-D6ECF6FE023E"},"schema":"MobileWikiAppDailyStats","wiki":"enwiki"}

I've verified the events are sent correctly from the client side, the events are logged correctly in MariaDB, and there is no error logged in the event.eventerror table in Hive. But these events are logged incorrectly in the event.mobilewikiappdailystats table in Hive -- values are null for both appinstallidand ts fields.
I guess this problem is the result of the newer revision of this schema being backward incompatible -- newer revision 18111418 requires languages, app_install_id and client_dt fields.

Queries:

-- event.mobilewikiappdailystats table in Hive
select *
from mobilewikiappdailystats
where year=2019 and month = 6 and day = 1
and useragent.wmf_app_version = "6.2.3.1612"
and useragent.os_family = "iOS"
limit 100

-- MobileWikiAppDailyStats_17984412 table in MariaDB
select *
from MobileWikiAppDailyStats_17984412
where userAgent like "%6.2.3.1612%"
and left(timestamp,8)='20190601'
limit 100

Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Bug Report". · View Herald TranscriptJun 20 2019, 10:33 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

mmmm, ya, if records are inserted in the corresponding hive tables they will not get reported in eventerror. Now, there is something odd here, when we refine we get the schema for the 1st record we find and we assume always backwards compatibility of schemas.

This event: {"revision":17984412,"event":{"ts":"2019-06-20T14:53:23-07:00","is_anon":true,"appInstallAgeDays":0,"appInstallID":"5A9CAA64-4A76-4E1D-ACA6-D6ECF6FE023E"},"schema":"MobileWikiAppDailyStats","wiki":"enwiki"}
according to last version of the schema is not valid so I would not expect for it to be on the table even with a null appInstallID

mforns triaged this task as High priority.
mforns moved this task from Incoming to Operational Excellence on the Analytics board.

This event: {"revision":17984412,"event":{"ts":"2019-06-20T14:53:23-07:00","is_anon":true,"appInstallAgeDays":0,"appInstallID":"5A9CAA64-4A76-4E1D-ACA6-D6ECF6FE023E"},"schema":"MobileWikiAppDailyStats","wiki":"enwiki"}
according to last version of the schema is not valid so I would not expect for it to be on the table even with a null appInstallID

@Nuria , I think the problem is that this event is in the MariaDB, but not Hive. Query in MariaDB:

select *
from MobileWikiAppDailyStats_17984412
where event_appInstallID = "5A9CAA64-4A76-4E1D-ACA6-D6ECF6FE023E"
------
| id       | uuid                             | dt   | timestamp      | userAgent                                                                                                                                                                                                                           | webHost | wiki   | event_appInstallAgeDays | event_appInstallID                   | event_ts                  | event_is_anon |
| 19372622 | 34ad58c647615a34afe373e2ce20cfa8 | NULL | 20190620215341 | {"wmf_app_version": "6.2.3.1612", "os_minor": "3", "os_major": "12", "is_bot": false, "device_family": "Other", "os_family": "iOS", "browser_minor": null, "is_mediawiki": false, "browser_major": null, "browser_family": "Other"} | NULL    | enwiki |                       0 | 5A9CAA64-4A76-4E1D-ACA6-D6ECF6FE023E | 2019-06-20T14:53:23-07:00 |             1 |

I'm wondering if this is an issue we can solve for Hive.

@chelsyx ah sorry, that makes sense. maria db storage will accept non backwards compatible changes in schemas but not hive, it is not possible to support non backwards compatible schemas in storage backed up by hadoop.

Now, I would expect events that do not validate (to the latest version of schema) to be logged in on eventterror, right? pinging @Ottomata

when we refine we get the schema for the 1st record we find and we assume always backwards compatibility of schemas.

For EventLogging Hive, we actually use the latest schema. The latest schema is used to read the data, so if the data has fields that are not in the lastest schema, they will not be read. Removing fields is a backwards incompatible change.

Hm, actually, I think I can fix this behavior. We still shouldn't do backwards incompatible changes, but I think Refine should probably merge the latest schema with the Hive table schema before reading data. This would cause it to keep and use fields from Hive that are no longer in the EL schema when reading the data.

@chelsyx in the short term, you should edit your schema, and re-add the fields you removed. You can mark them as deprecated or something in the descriptions. That will enable Refine to find your old data.

Thanks @Ottomata and @Nuria !
Since this schema is used by both the iOS and Android team and the two teams are using different revision (with different required fields), I will discuss with @mpopov before making the change.

Mentioned in SAL (#wikimedia-analytics) [2019-07-18T18:34:41Z] <ottomata> backfilling MobileWikiAppDailyStats data since June 7 to populate misisng fields (e.g. appinstallid) in refined data. - T226219