Page MenuHomePhabricator

Evaluate number of distinct referer hosts as a possible scoring metric
Open, Needs TriagePublic

Description

Might be a useful scoring metric, not sure. For reference here are the top 20 pages for 2016/2/2 (with obvious search engines excluded. i'm sure i missed a few). Not sure how exactly to do a comparison, or how it would be integrated ... just a thought.

The numbers are pretty small so maybe it's not distinct enough of a ranking signal...Not sure about stability either. Basically was just a thought, partially based on greg lindahl's comments to https://www.quora.com/How-many-iterations-should-I-do-to-compute-near-accurate-PageRank-of-Wikipedia-Articles-in-the-latest-dump

select project, count(distinct referer_host) as num_referers, collect_set(page_title)[0] as page_title from (select lower(parse_url(referer, 'HOST')) as referer_host, page_id, pageview_info['project'] as project, pageview_info['page_title'] as page_title  FROM webrequest where year=2016 and month=2 and day=2 and is_pageview = true and page_id IS NOT NULL and page_id > 1 and referer IS NOT NULL and referer_class <> 'internal') x where 0 == instr(referer_host, ".google.") and 0 == instr(referer_host, "duckduckgo.com") and 0 == instr(referer_host, "search.yahoo.com") and 0 == instr(referer_host, "yandex.ru") and 0 == instr(referer_host, "baidu.com") group by project, page_id order by num_referers desc limit 20;

project       en.wikipedia
num_referers  4228
page_title    Main_Page

project       ru.wikipedia
num_referers  1431
page_title    Заглавная_страница

project       de.wikipedia
num_referers  1310
page_title    Wikipedia:Hauptseite

project       en.wikipedia
num_referers  1261
page_title    Donald_Trump

project       ja.wikipedia
num_referers  944
page_title    メインページ

project       es.wikipedia
num_referers  906
page_title    Wikipedia:Portada

project       en.wikipedia
num_referers  718
page_title    Stop_words

project       en.wikipedia
num_referers  547
page_title    World_population

project       fr.wikipedia
num_referers  546
page_title    Wikipédia:Accueil_principal

project       en.wikipedia
num_referers  487
page_title    Groundhog_Day

project       en.wikipedia
num_referers  463
page_title    HTTP_cookie

project       meta.wikimedia
num_referers  457
page_title    Help:Contents

project       ru.wikipedia
num_referers  449
page_title    Труд

project       en.wikipedia
num_referers  424
page_title    Iowa_caucuses

project       en.wikipedia
num_referers  395
page_title    Zika_virus

project       en.wikipedia
num_referers  385
page_title    United_States

project       en.wikipedia
num_referers  369
page_title    Ted_cruz

project       ar.wikipedia
num_referers  365
page_title    الصفحة_الرئيسية

project       en.wikipedia
num_referers  356
page_title    RESTful

project       en.wikipedia
num_referers  349
page_title    Facebook

Event Timeline

EBernhardson raised the priority of this task from to Needs Triage.
EBernhardson updated the task description. (Show Details)
EBernhardson added a project: CirrusSearch.
EBernhardson added a subscriber: EBernhardson.
Restricted Application added a project: Discovery. · View Herald TranscriptFeb 8 2016, 5:11 AM
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript
EBernhardson moved this task from Needs triage to Search on the Discovery board.Feb 11 2016, 11:18 PM