When querying for unique last-access clients grouped by uri_host in the webrequest table - some of the host names with low uniques counts look like they are not Wikimedia projects or in some cases - mixed case version names of existing ones (Eg: EN.wikipedia.org). We should normalize these like, en.wikipedia.org/En.wikipedia.org -> enwiki.
Description
Description
Event Timeline
• madhuvishy claimed this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 5 2015, 11:36 PM2015-05-05 23:36:30 (UTC+0)
• madhuvishy updated the task description. (Show Details)May 5 2015, 11:51 PM2015-05-05 23:51:53 (UTC+0)
• madhuvishy set Security to None.
• kevinator updated the task description. (Show Details)May 22 2015, 6:11 AM2015-05-22 06:11:46 (UTC+0)
Ottomata closed this task as a duplicate of T96044: Create new normalized uri_host field in refined webrequest table {hawk} [13 pts].May 22 2015, 1:39 PM2015-05-22 13:39:44 (UTC+0)
Ottomata subscribed.
Comment ActionsThanks Madhu, I merged this with Yurik's ticket.
• kevinator moved this task from Next Up to Done on the Analytics-Kanban board.May 22 2015, 3:45 PM2015-05-22 15:45:41 (UTC+0)