Page MenuHomePhabricator

Pageviews ingestion Investigation [10% time]
Open, LowPublic

Description

As a WME Engineer i want to document an approach to fetch data from WMF datalake to our side.

We are building pageviews dataset, on top of existing data, on wmf side. This is to simplify the ingestion and getting close to a near realtime batch processing. Instead of having to download hourly batches we looking into have a more flexible approach.

The goal here is just a document to have internal discussion and also discussions with Data Platform Team.

Acceptance Criteria

  • Documented approach
  • Presented to team

Event Timeline

FNavas-foundation renamed this task from [stub] pageviews infra ideas ricardo to Prep infra ideas for receiving pageviews from wmf.May 28 2025, 5:51 PM
FNavas-foundation updated the task description. (Show Details)
REsquito-WMF renamed this task from Prep infra ideas for receiving pageviews from wmf to Pageviews ingestion Investigation.May 29 2025, 11:29 AM
REsquito-WMF triaged this task as Low priority.
REsquito-WMF updated the task description. (Show Details)
REsquito-WMF renamed this task from Pageviews ingestion Investigation to Pageviews ingestion Investigation [10% time].May 29 2025, 2:34 PM

As part of this work we have requested resources on the DPE teams side: Ticket is here: https://phabricator.wikimedia.org/T396672

JArguello-WMF removed LDlulisa-WMF as the assignee of this task.
JArguello-WMF added a subscriber: LDlulisa-WMF.

@FNavas-foundation we'll discuss this ticket after Hal's presentation on July 7th

@LDlulisa-WMF - we should kill this ticket, it's duplicate because of your current investigation correct? (cc @JArguello-WMF )