Page MenuHomePhabricator

Review parent task for any potential pageview definition improvements
Open, Needs TriagePublic

Event Timeline

Nuria created this task.Jan 30 2017, 6:37 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 30 2017, 6:37 PM

See parent task and see if there's anything to change on the pageview definition (but not fixing mediawiki's problem of returning 200s for malformed requests).

Milimetric renamed this task from Utility that creates pageview dumps should escape new lines to Review parent task for any potential pageview definition improvements.Feb 2 2017, 5:30 PM
Milimetric moved this task from Incoming to Backlog (Later) on the Analytics board.
mforns lowered the priority of this task from Normal to Low.Apr 19 2018, 4:55 PM
Nuria raised the priority of this task from Low to High.Sep 26 2018, 7:16 PM
Nuria moved this task from Operational Excellence to Data Quality on the Analytics board.
Ottomata raised the priority of this task from High to Needs Triage.Oct 4 2018, 5:37 PM
Ottomata moved this task from Data Quality to Deprioritized on the Analytics board.
awight added a subscriber: awight.Mar 19 2019, 9:02 AM

@Milimetric Would you mind pointing me to the definition this task will update? If there are formatting changes to how fields are delimited and escaped, we will need to find documentation to update, and write release notes for downstream consumers of the dump files.

Sorry to have missed this ping @awight, and thanks for the work! The pagecounts-raw data is the older stuff, where you updated the docs as mentioned in T144100#5053676. But the issue you're fixing will improve the pageviews data. That's documented here: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageviews. I noticed the same edit you made on the pagecounts-raw's version of page_title is there on the pageviews schema detail, so I think the docs are up to date. Though it is confusing, ping me if you think I'm confused :)

awight added a comment.EditedMar 27 2019, 6:40 AM

I noticed the same edit you made on the pagecounts-raw's veI noticed the same edit you made on the pagecounts-raw's version of page_title is there on the pageviews schema detail

Glad to be confused in good company! It's just because the table of column definitions is transcluded from the "raw" page. I can't say whether that's a good idea or not, though...

oh! thanks! I was confused, now I'm not :)