Page MenuHomePhabricator

<Platform Initiative> Dumps + WME Gap Analysis
Closed, DeclinedPublic

Description

Request Status: New Request
Request Type: core technology support
Related OKRs: TGUC KR1

Request Title: Dumps + WME Gap Analysis

  • Request Description: What is the planned path forward regarding WME and Dumps products? Is there an opportunity to consolidate these two products for better functionality/maintenance coverage going forward?

We have WME Exports and our existing Dumps products which both have components of exporting our data for consumption by third parties.

From a product perspective we would like to better understand who currently uses Dumps and specifically what feature sets within Dumps they use and if this is something we should be looking to add into WME to perhaps provide increased functionality but also move us towards having a single product offering for data consumption use cases. Some supporting materials are listed below:

This is what OKAPI has
https://docs.google.com/document/d/1kZRkDiAAQ83WG1ukrVeIOB85Gob1UahlBpgp9VcAeA0/edit

And this is Dumps
https://upload.wikimedia.org/wikipedia/commons/3/33/Wikimedia_dumps_high_level_overview.pdf

OKAPI provides HTML Dumps with updates less than monthly.

Dumps contains all data, in mutliple formats, XML,JSON,Wikitext
https://meta.wikimedia.org/wiki/Data_dumps

  • Main Requestors: Enterprise, Platform Leadership
  • Ideal Delivery Date: February 2022
  • Stakeholders: Wide-spread community impact, Research & other internal teams that leverage for analysis or product features

Request Documentation

Document TypeRequired?Document/Link
Related PHAB TicketsYes<add links here>
Product One PagerYes<add link here>
Product Requirements Document (PRD)Yes<add link here>
Product RoadmapYes<add link here>
Product Planning/Business CaseNo<add link here>
Product BriefNo<add link here>
Other LinksNo<add links here>

Event Timeline

MNadrofsky set Due Date to Dec 1 2021, 5:00 AM.

I have just seen this today, and not sure how I can help. Please let me know. Thanks.

I also just saw this today, and it's something we've been talking about for a very long time. I'm happy to jump in on a meeting to discuss, there's a clear answer here but it's broad, and I think it's crucial that we look at the whole picture.

DAbad renamed this task from Dumps + WME Gap Analysis to <Platform Initiative> Dumps + WME Gap Analysis.Dec 16 2021, 6:27 PM
DAbad triaged this task as Medium priority.
DAbad updated the task description. (Show Details)

2021-12-08 - Technology Steering Committee

  • In discussing this with @wdoran and the enterprise team we agreed that work on this will not yet commence until new members are onboarded
  • January 2022 is target start date for design jams with data engineering, PET, and enterprise
DAbad changed Due Date from Dec 1 2021, 5:00 AM to Feb 28 2022, 5:00 AM.Dec 16 2021, 6:33 PM

A meeting is set for Jan 18th with folks from Data Engineering, and one is to be scheduled with WME folks as well, to start talking about what would be needed for a common dumps architecture.

The meeting with WME folks has been moved to Jan 27th due to scheduling issues.

World's shortest summary of discussions from both meetings:

  • Everyong agreed that having one common platform that accomodates all three groups (Data Engineering, WME, PET/Dumps) would be grand.
  • Different groups have particular needs, e.g. access to private data via the databases ort certain customer requirements, that would have to be covered by any common platform.
  • This is a project of very broad scope that will need solid support from upper management and dedicated resources before getting underway, even in the design phase.

Next up, we should have a short joint meeting to hammer out immediate next steps. Who wants to schedule it, @WDoranWMF ? (Thanks in advance :-P )

@DAbad: Hi, the Due Date set for this open task passed a while ago.
Could you please either update or reset the Due Date (by clicking Edit Task), or set the status of this task to resolved in case this task is done? Thanks!

DAbad removed Due Date.
DAbad moved this task from Investigate to Backlog on the Foundational Technology Requests board.

Moving to parking lot while we work on events experiment which will be input for this work

@WDoranWMF am I right to assume that this is long since moot, superceded by various other things? If not, can the task be updated to reflect the current work left to do and who will be taking that on? If you are not the right person to answer this, perhaps you can redirect me to the right person. Thanks!

@ArielGlenn I checked with @lbowmaker and this is unlikely to progress since we will likely have different routes to take as part of Dumps 2.0. My thought would be to close this linking to the Dumps 2.0 board unless you have any objections?

This work will be handled under the new Dumps 2.0 effort as part of the newly formed Data Platform Engineering Data Products team, work can be followed on the Dumps 2.0 workboard.