Page MenuHomePhabricator

Import page_props table to Hive
Closed, ResolvedPublic

Description

It would be great to have the page_props table from the SQL replicas in Hive, so that it's easily joinable with other tables like mediawiki_page or mediawiki_imagelinks.

Event Timeline

Milimetric moved this task from Incoming to Datasets on the Analytics board.

@Nuria -- the Growth and Android teams are both currently prototype an image suggestion algorithm with @Miriam in T256081: Image matching algorithm. This task would unlock a potential route to increased accuracy in the algorithm. Could you please give us a sense of the level of work to add the table to Hive? Is it quick and easy, or something we would need to plan for? Thank you!

Change 628770 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Add page_props and user_properties to sqoopable tables

https://gerrit.wikimedia.org/r/628770

JAllemandou added a project: Analytics-Kanban.
JAllemandou set Final Story Points to 3.
JAllemandou moved this task from Next Up to In Code Review on the Analytics-Kanban board.

@MMiller_WMF change is on the works, will be effective with the next mediawiki snapshot

Change 628770 merged by Joal:
[analytics/refinery@master] Add page_props & user_properties to sqoop/hive/oozie

https://gerrit.wikimedia.org/r/628770

@Nuria -- thank you! That will be in about two weeks I guess. @Miriam FYI.

Change 629070 had a related patch set uploaded (by Joal; owner: Joal):
[operations/puppet@production] Add page_props and user_properties to analytics sqoop

https://gerrit.wikimedia.org/r/629070

@MMiller_WMF we missed this month deploy of this change, will it be oK to wait for the run of November 1st or you needed it sooner?

Change 629070 merged by Elukey:
[operations/puppet@production] Add page_props and user_properties to analytics sqoop

https://gerrit.wikimedia.org/r/629070

@MMiller_WMF we missed this month deploy of this change, will it be oK to wait for the run of November 1st or you needed it sooner?

The data has been manually imported to solve data-dependency problem. it is available from now on.

Oh okay, great! @Miriam -- could you please check to see that the data you need is there? Then we can resolve the task.