As a WME Engineer i want to document an approach to fetch data from WMF datalake to our side.
We are building pageviews dataset, on top of existing data, on wmf side. This is to simplify the ingestion and getting close to a near realtime batch processing. Instead of having to download hourly batches we looking into have a more flexible approach.
The goal here is just a document to have internal discussion and also discussions with Data Platform Team.
Acceptance Criteria
- Documented approach
- Presented to team