Page MenuHomePhabricator

[REQUEST] Notebook for testing with wmfdata
Closed, DuplicatePublic

Description

Context: 2021-11-24 PA⇄DE sync meeting

The Data-Engineering team would like Product-Analytics to provide a Jupyter notebook that they (DE) can execute to test configuration & stack changes and check if those changes break end-user querying capability.

There should be at least one query run with wmfdata's spark module.

Event Timeline

mpopov triaged this task as Medium priority.Nov 24 2021, 5:15 PM
mpopov created this task.

@BTullis: would it make sense to have it query via Hive as well? (by the way, wmfdata's hive module is a wrapper for hive CLI) and if yes, would you rather have two separate notebooks – one for each engine?

I already have a open pull request for this in the wmfdata-python repo (although it hasn't been touched in about 10 months). I'll work on getting that finished in the other ticket!