Page MenuHomePhabricator

Normalize the domain names while querying for uniques based on last-access cookie
Closed, DuplicatePublic

Description

When querying for unique last-access clients grouped by uri_host in the webrequest table - some of the host names with low uniques counts look like they are not Wikimedia projects or in some cases - mixed case version names of existing ones (Eg: EN.wikipedia.org). We should normalize these like, en.wikipedia.org/En.wikipedia.org -> enwiki.

Event Timeline

madhuvishy claimed this task.
madhuvishy raised the priority of this task from to Normal.
madhuvishy updated the task description. (Show Details)
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 5 2015, 11:36 PM
madhuvishy set Security to None.
Ottomata added a subscriber: Ottomata.

Thanks Madhu, I merged this with Yurik's ticket.

kevinator moved this task from Next Up to Done on the Analytics-Kanban board.May 22 2015, 3:45 PM