Page MenuHomePhabricator

pagecounts-ez uploads stopped after 9/24
Closed, ResolvedPublic

Description

https://dumps.wikimedia.org/other/pagecounts-ez/merged/2020/2020-09/

there are no newer dumps in pagecounts-ez/merged than 9/24 but it looks like these were being produced and uploaded daily until 9/25 when it stopped. Is there an ongoing known issue?

Event Timeline

Documentation for the new data is coming up, and we'll send communication on the mailing lists once we vet the new data and it's ready.

I didn't find the total per month in those files, it will not be provided anymore? I have some tools that use the total pegecounts per month per article, that is the only data I need from the pagecounts files.

The monthly totals are not available yet, see T265732. A quick status update:

  • we found some malformed rows in pageview_complete, and we're fixing them
    • some had 5 columns instead of 6, because they're missing a page_id
    • some had fewer than 5 columns due to line breaks and carriage returns in the page title messing up the lines, we're fixing those too

If you're parsing through these files, the recommended actions are to:

  • use the 5-column lines, just assign page_id = None / null
  • throw out any lines with fewer than 5 columns