Page MenuHomePhabricator

Migrate format of pageview dumps to replace two or more consecutive zeros by char caret and then the count of the zeros
Closed, DeclinedPublicFeature

Description

Feature summary:
Dumps:

convert(ing) from old format [1] into D0CMF.

Disadvantages of the current format:

  • interval with a maximum of 31 values,
  • now, If user wanna data about the one wiki so he must download the data about all wikis.

Benefits of new format would be:

  • Better user comfort,
  • possibility grouping data (titles by local wiki),
  • low data to store,
  • low data to traffic.

D0CMF:

{1} https://dumps.wikimedia.org/other/pageview_complete/readme.html, section "Data format".

Event Timeline

Aklapper changed the task status from Open to Stalled.May 30 2025, 11:52 AM

Migration of pageview dumps to D0CMF

Migration from what currently?

plus new dump split do files (§1) by local wikipedia, merge (§2) by day, month or year and number pageview data store (§3) in D0CMF.

Sorry, I cannot parse this sentence.

Benefits:

What are the disadvantages?

ex. separe local wiki,

I do not know what that means.

@Dusan_Krehel: This implies that all external software reading these Pageview datadumps needs to be rewritten to handle the proposed new format?

Aklapper renamed this task from Migration of pageview dumps to D0CMF to Migrate format of pageview dumps to replace two or more consecutive zeros by char caret and then the count of the zeros.May 30 2025, 7:57 PM
Aklapper changed the task status from Stalled to Open.

@Aklapper Yes, it's require a software update for consumer dump software.

@Dusan_Krehel Thank you for sharing the D0CMF pageview dump format proposal. I appreciate the detailed analysis of the current format's limitations and the potential benefits of the new approach.

After reviewing the proposal, I don't think this is something we can move forward with at this time. While the concept of per-wiki downloads and improved storage efficiency has merit, there are several considerations that make this challenging to implement in our current roadmap:

  • The migration would require significant coordination across multiple teams, substantial testing with our existing user base, and careful planning around backwards compatibility. * * Given our current priorities and resource constraints, we're not in a position to take on a project of this scope.

I'd encourage you to continue developing the format and gathering feedback from the community. If there's strong user demand and the technical approach matures further, this could be something to revisit in future planning cycles.

Thank you again for the thoughtful proposal and the work you've put into analyzing this problem.