Page MenuHomePhabricator

Bug in current data retrieval script
Closed, ResolvedPublic

Description

Several new datasets (first_visits_country.tsv and last_action_country.tsv) generated by golden/portal/portal.R for T138107 contain bugs:

Screen Shot 2016-09-22 at 2.18.40 PM.png (343×649 px, 34 KB)

Current portal.R script worked fine when we backfilled the data. However, those dataset got mess up when they are updated one day after. One suspicion is the conflict between those data frames and function wmf::rewrite_conditional.

Event Timeline

Change 312440 had a related patch set uploaded (by Chelsyx):
Ungroup last_action_country and first_visits_country

https://gerrit.wikimedia.org/r/312440

Change 312440 merged by Chelsyx:
Ungroup last_action_country and first_visits_country

https://gerrit.wikimedia.org/r/312440

I checked the function wmf::rewrite_conditional and sink file run.Rout at /a/discovery/golden, but couldn't find anything suspicious... Since most_common_country.tsv got no problem when updated, and "ungroup()" seems to be the main difference between it and the two problematic data frames, I ungroup last_action_country and first_visits_country and fix the bug. But more investigation is needed later to figure out why wmf::rewrite_conditional would generate bug with grouped data frame.

Looks like all we need to do is to have a check in rewrite_conditional if incoming data frame is grouped and then ungroup it if it's not, since the problem very likely comes from trying to rbind a grouped df (incoming) with an ungrouped df (existing data that is read).

Change 314458 had a related patch set uploaded (by Bearloga):
Ungroup grouped data frames

https://gerrit.wikimedia.org/r/314458

Change 314458 merged by Chelsyx:
Ungroup grouped data frames

https://gerrit.wikimedia.org/r/314458