Page MenuHomePhabricator

Missing documentation for pageviews dataset
Closed, ResolvedPublic2 Estimated Story Points

Event Timeline

it's there for me. i just checked by clicking on the above link. (I also tried going directly from the analytics index page to see if the link is different and broken there but it worked for me from there too.

Someone just added it (the file is dated to two hours ago). Thanks!

It still does not document the format though. Which is apparently not the same as the aggregated dump format (which does have docunmentation).

Ah ha! I hope whoever added the file will update it with the information you need. (I wonder who that kind person was?)

Instead of maintaining Readme documentation on dumps.wikimedia.org, we should link back to the corresponding documentation pages on Wikitech , which are more reliable and up to date. This is already done on e.g. https://dumps.wikimedia.org/other/pagecounts-raw/ .

Milimetric triaged this task as Medium priority.
Milimetric added a project: Analytics-Kanban.
Milimetric set the point value for this task to 2.
Milimetric moved this task from Incoming to Operational Excellence on the Analytics board.

Something really really strange is happening. I added this file back in January: https://github.com/wikimedia/puppet/blob/53abe99dc8604f176e95ae7028efd6cf76cf6645/modules/dumps/manifests/web/html.pp#L58 (content here: https://github.com/wikimedia/puppet/blob/53abe99dc8604f176e95ae7028efd6cf76cf6645/modules/dumps/files/web/html/pageviews_readme.html) so I have no idea why it just showed up, there was no puppet change that I can see that would've done anything to it... But when I navigate to that link, sometimes it shows up and sometimes it doesn't. Maybe someone with better puppet-fu than me can take a look? @ArielGlenn?

Also, @Tbayer you're right, I'll update the content to link to the docs on Wikitech (it already links to the Research docs). But I think it's useful to have a friendly readme that doesn't get too specific (which this does not).

I chimed in on that stackoverflow post, thanks @Tgr.

Change 452738 had a related patch set uploaded (by Milimetric; owner: Milimetric):
[operations/puppet@production] Add reference to Wikitech docs

https://gerrit.wikimedia.org/r/452738

Change 452738 merged by Ottomata:
[operations/puppet@production] Add reference to Wikitech docs

https://gerrit.wikimedia.org/r/452738

Change 452945 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Exclude readme.html from being deleted during dumps::web::fetches::stats jobs

https://gerrit.wikimedia.org/r/452945

Change 452945 merged by Ottomata:
[operations/puppet@production] Exclude readme.html from being deleted during dumps::web::fetches::stats jobs

https://gerrit.wikimedia.org/r/452945

The reason the readme.html file(s) kept disappearing is that the rsync job that fetches the new datafiles from stat1005 uses the --delete flag, and the readme.html files are not on stat1005. I'd put them there, except the directory structure isn't quite the same, and it isn't clear how this would work. Instead, I modified the rsync job to --exclude readme.html, so that hopefully the files won't be considered by rsync for deletion.