Page MenuHomePhabricator

Fix mediawiki_page_restrictions_change table after migration to eventgate-main and schema version 1.0.0
Closed, ResolvedPublic3 Estimated Story Points

Description

In the 1.0.0 page-restrictions-change schema created for the migration to EventGate, I changed the page_restrictions field the unsupported patternProperties to our (newly) supported additionalProperties { type: string} AKA map type.

This works, except for the mediawiki_page_restrictions_change Hive table already has the page_restrictions field as a struct. I think this table probably should have previously been blacklisted from Refine. It will work fine now though.

We either need to drop all existing Hive data and start from scratch (fine with me, as AFAIK no one uses this table, and it wasn't really compatible with Refine in the first place...), OR convert all the existing data to the new schema. I'm not entirely sure how to do the conversion from struct to map in a Hive query, so we'll probably need to do a transform in Spark.

Event Timeline

I'm pretty sure no one is using this table, so I'm going to make a decision to move the old data out of the way and start fresh. I'll keep the old data in a renamed table for a while, until we all discuss. If we decide to backfill, we can still do it.

Mentioned in SAL (#wikimedia-analytics) [2019-06-19T13:41:07Z] <ottomata> renaming event.mediawiki_page_restrictions_change to event.mediawiki_page_restrictions_change_T226051 - T226051

sudo -u analytics hdfs dfs -mv /wmf/data/event/mediawiki_page_restrictions_change /wmf/data/event/mediawiki_page_restrictions_change_T226051
ALTER TABLE event.mediawiki_page_restrictions_change RENAME TO event.mediawiki_page_restrictions_change_T226051;
ALTER TABLE event.mediawiki_page_restrictions_change_T226051 SET LOCATION "hdfs://analytics-hadoop/wmf/data/event/mediawiki_page_restrictions_change_T226051";
Ottomata set the point value for this task to 3.
Ottomata moved this task from Next Up to Done on the Analytics-Kanban board.

Refine: Successfully refined 14 of 14 dataset partitions into table event.mediawiki_page_restrictions_change (total # refined records: 175)

Milimetric moved this task from In Code Review to Done on the Analytics-Kanban board.
Milimetric subscribed.

for the record: decided to drop old table, event.mediawiki_page_restrictions_change_T226051.

hive (event)> drop table event.mediawiki_page_restrictions_change_t226051;

[@stat1004:/mnt/hdfs/wmf/data/event] $ sudo -u analytics hdfs dfs -rm -R /wmf/data/event/mediawiki_page_restrictions_change_T226051
19/06/20 16:43:29 INFO fs.TrashPolicyDefault: Moved: 'hdfs://analytics-hadoop/wmf/data/event/mediawiki_page_restrictions_change_T226051' to trash at: hdfs://analytics-hadoop/user/analytics/.Trash/Current/wmf/data/event/mediawiki_page_restrictions_change_T226051