Page MenuHomePhabricator

MobileWikiAppiOSUserHistory sending incompatible data
Closed, ResolvedPublic

Description

In this metawiki schema change, the device_level_enabled type was changed, which is not a backwards compatible change. This is causing data to fail ingestion into Hive, and is causing alerts to be fired.

Possible solutions:

  1. Revert the schema and code change, and stop sending string values for device_level_enabled.
  2. Manually alter the Hive table event.device_level_enabled field to a string. This will likely cause any old data in Hive to be unreadable or corrupted.

Event Timeline

Hi @Ottomata, this is a new field we are tracking in this schema so we don't expect any old data. The data type in the schema was initially set to boolean but we weren't using that column yet and did not send any data to the schema. Are there more steps we need to take to enable tracking this value in the schema?

Hm, something must have sent this data with a boolean value initially. That's the only way the table would have been created this way.

Field types really aren't supposed to ever change. In the future, if you need to do this, you should add a new field.

We can manually alter the field type of the Hive table to string, and I think things will mostly work. I'm not totally sure what happens to old data though. If there are very few records this this as a boolean, the only thing that I think will break is if you try to select this field for hour partitions that have boolean values. I'm not totally sure though; it is possible that selecting anything in older partitions will break.

Mentioned in SAL (#wikimedia-analytics) [2022-01-07T20:16:39Z] <ottomata> altering hive table MobileWikiAppiOSUserHistory field event.device_level_enabled to string - T298721

Manually alter the Hive table event.device_level_enabled field to a string. This will likely cause any old data in Hive to be unreadable or corrupted.

Done.


ALTER TABLE `event.MobileWikiAppiOSUserHistory` CHANGE `event` `event` struct<app_install_id:string,event_dt:string,is_anon:boolean,measure_font_size:bigint,measure_readinglist_itemcount:bigint,measure_readinglist_listcount:bigint,primary_language:string,readinglist_showdefault:boolean,readinglist_sync:boolean,session_id:string,theme:string,feed_disabled:boolean,feed_enabled_list:struct<cr:boolean,fa:struct<`on`:array<string>,off:array<string>>,ns:struct<`on`:array<string>,off:array<string>>,od:struct<`on`:array<string>,off:array<string>>,pd:boolean,pl:struct<`on`:array<string>,off:array<string>>,rd:struct<`on`:array<string>,off:array<string>>,rp:boolean,tr:struct<`on`:array<string>,off:array<string>>>,search_tab:boolean,trend_notify:boolean,test_group:string,inbox_count:bigint,device_level_enabled:string>

(Note that I had to backtick quote the fields named on, as this seems to be a Hive keyword).

hive (event)> show create table MobileWikiAppiOSUserHistory;
OK
createtab_stmt
CREATE EXTERNAL TABLE `MobileWikiAppiOSUserHistory`(
  `dt` string,
  `event` struct<app_install_id:string,event_dt:string,is_anon:boolean,measure_font_size:bigint,measure_readinglist_itemcount:bigint,measure_readinglist_listcount:bigint,primary_language:string,readinglist_showdefault:boolean,readinglist_sync:boolean,session_id:string,theme:string,feed_disabled:boolean,feed_enabled_list:struct<cr:boolean,fa:struct<on:array<string>,off:array<string>>,ns:struct<on:array<string>,off:array<string>>,od:struct<on:array<string>,off:array<string>>,pd:boolean,pl:struct<on:array<string>,off:array<string>>,rd:struct<on:array<string>,off:array<string>>,rp:boolean,tr:struct<on:array<string>,off:array<string>>>,search_tab:boolean,trend_notify:boolean,test_group:string,inbox_count:bigint,device_level_enabled:string>,
...

Thanks @Ottomata - noting here for posterity that I am able to extract data from prior to 2022-01-08 using Hue/Hive but any partition prior to 2022-01-08 gives an error using Superset. I have archived the data prior to 2022-01-08 going back 90 days (using Spark) in case we need it so we are ok with any old data being inaccessible in Superset. This should be resolved and no further action needed.

Presto Error
presto error: There is a mismatch between the table and partition schemas. The types are incompatible and cannot be coerced. The column 'event' in table 'event.mobilewikiappiosuserhistory' is declared as type 'struct<app_install_id:string,event_dt:string,is_anon:boolean,measure_font_size:bigint,measure_readinglist_itemcount:bigint,measure_readinglist_listcount:bigint,primary_language:string,readinglist_showdefault:boolean,readinglist_sync:boolean,session_id:string,theme:string,feed_disabled:boolean,feed_enabled_list:struct<cr:boolean,fa:struct<on:array<string>,off:array<string>>,ns:struct<on:array<string>,off:array<string>>,od:struct<on:array<string>,off:array<string>>,pd:boolean,pl:struct<on:array<string>,off:array<string>>,rd:struct<on:array<string>,off:array<string>>,rp:boolean,tr:struct<on:array<string>,off:array<string>>>,search_tab:boolean,trend_notify:boolean,test_group:string,inbox_count:bigint,device_level_enabled:string>', but partition 'year=2022/month=1/day=7/hour=9' declared column 'event' as type 'struct<app_install_id:string,event_dt:string,is_anon:boolean,measure_font_size:bigint,measure_readinglist_itemcount:bigint,measure_readinglist_listcount:bigint,primary_language:string,readinglist_showdefault:boolean,readinglist_sync:boolean,session_id:string,theme:string,feed_disabled:boolean,feed_enabled_list:struct<cr:boolean,fa:struct<on:array<string>,off:array<string>>,ns:struct<on:array<string>,off:array<string>>,od:struct<on:array<string>,off:array<string>>,pd:boolean,pl:struct<on:array<string>,off:array<string>>,rd:struct<on:array<string>,off:array<string>>,rp:boolean,tr:struct<on:array<string>,off:array<string>>>,search_tab:boolean,trend_notify:boolean,test_group:string,inbox_count:bigint,device_level_enabled:boolean>'.


This may be triggered by:
Issue 1002 - The database returned an unexpected error.
odimitrijevic claimed this task.
odimitrijevic moved this task from Incoming (new tickets) to Ops Week on the Data-Engineering board.
odimitrijevic subscribed.

Based on Conversation in slack I think this is done. Please reopen if otherwise.