Page MenuHomePhabricator

Examine clickthrough ratios for different page elements of action=history pages
Open, HighPublic

Description

We would like to estimate the popularity of different kinds of links on desktop history pages like https://en.wikipedia.org/w/index.php?title=Antlia&action=history , including:

  • diff links (cur, prev, Compare selected revisions etc.)
  • old revisions (e.g. https://en.wikipedia.org/w/index.php?title=Antlia&oldid=880744665 )
  • other action=history views e.g. https://en.wikipedia.org/w/index.php?title=Antlia&offset=20170119050717&action=history )
    • by number of items (20 | 50 | 100 | 250 | 500)
  • User pages
  • User talk pages
  • All non-special pages
  • User contribution pages
  • undo
  • (for logged-in views): Thanks
    • Probably not possible to determine from webrequest data alone. (The Special:Thanks link in each revision entry is caught/suppressed in JS, triggering the revealing of the Thanks confirmation link, which in turn only sends a POST API request whose parameters won't be recorded in webrequest .)
  • Special:Log? (View logs for this page)
  • Special:AbuseLog? (view filter log)
  • Show/hide revisions
  • Edit change tags
  • ...

See also https://meta.wikimedia.org/wiki/Schema:ChangesListClickTracking

Event Timeline

phuedx added a comment.EditedJan 29 2019, 7:01 PM
+----------+------------+--------+
|  limit   | logged_in  |   n    |
+----------+------------+--------+
| 500      | true       | 1104   |
| 250      | true       | 96     |
| 100      | true       | 82     |
| 50       | true       | 10     |
| 20       | true       | 7      |
| 3000     | true       | 5      |
| 5000     | true       | 3      |
| 1000     | true       | 2      |
| 2000     | true       | 2      |
| 41       | true       | 1      |
| 14       | true       | 1      |
| 1500     | true       | 1      |
| 9        | true       | 1      |
| 23       | true       | 1      |
| 24       | true       | 1      |
| 33       | true       | 1      |
+----------+------------+--------+
| 500      | false      | 37385  |
| 100      | false      | 411    |
| 250      | false      | 339    |
| 20       | false      | 307    |
| 50       | false      | 181    |
| 2000     | false      | 34     |
| 1000     | false      | 32     |
| 1        | false      | 31     |
| 2        | false      | 17     |
| 10000    | false      | 16     |
| 5000     | false      | 13     |
| default  | false      | 12     |
| 3        | false      | 12     |
| 200      | false      | 12     |
| 5        | false      | 10     |
| 4        | false      | 4      |
| 50000    | false      | 3      |
| 11       | false      | 2      |
| 800      | false      | 2      |
| 6        | false      | 2      |
| 3500     | false      | 1      |
| 232      | false      | 1      |
| 187      | false      | 1      |
| 10       | false      | 1      |
| 7        | false      | 1      |
| 3900     | false      | 1      |
| 93       | false      | 1      |
| 1500     | false      | 1      |
| 12       | false      | 1      |
+----------+------------+--------+
with dataset as (
    select
        parse_url(concat('https://', uri_host, uri_path, uri_query), 'QUERY', 'limit') as limit,
        if(x_analytics_map['loggedIn'] = '1', true, false) as logged_in
    from
        wmf.webrequest
    where
        year = 2019
        and month = 1
        and day = 28

        and agent_type = 'user'

        and normalized_host.project_family = 'wikipedia'
        and normalized_host.project = 'en'

        and uri_query like '%action=history%'
        and uri_query like '%limit=%'

        and http_status = 200
)

select
    if(limit is not null, limit, 'default') as limit,
    logged_in,
    count(*) as n
from
    dataset
group by
    logged_in,
    limit
sort by
    logged_in desc,
    n desc
limit 10000
;

A first quick-and-dirty result from one day of data:

  • diff links: ca. 32%
  • All non-special pages: ca. 25%
  • old revisions: ca. 16%
  • other action=history views: ca. 13%
SELECT SUM(IF(uri_query LIKE '%action=history%',1,0))/SUM(1) AS history,
SUM(IF(uri_query LIKE '%diff=%',1,0))/SUM(1) AS diff,
SUM(IF((uri_query LIKE '%oldid=%') AND NOT (uri_query LIKE '%diff=%'),1,0))/SUM(1) AS old_revision, 
SUM(IF((uri_path LIKE '/wiki/%') AND (uri_path NOT LIKE '%Special:%'),1,0))/SUM(1) AS nonspecial_pageview,
SUM(1) AS all_clickthroughs
FROM wmf.webrequest
WHERE year = 2019 AND month = 1 AND day = 28 
AND pageview_info['project'] = 'en.wikipedia'
AND agent_type = 'user'
AND referer LIKE '%action=history%';

history	diff	old_revision	nonspecial_pageview	all_clickthroughs
0.12833954955099167	0.3159065806534232	0.1571369115269485	0.2528693313697556	124942
Time taken: 253.591 seconds, Fetched: 1 row(s)
Tbayer updated the task description. (Show Details)Jan 29 2019, 7:34 PM
Tbayer updated the task description. (Show Details)Jan 29 2019, 8:14 PM
ovasileva triaged this task as High priority.Jan 29 2019, 9:57 PM
ovasileva moved this task from To Triage to Product Owner Backlog on the Readers-Web-Backlog board.
Tbayer updated the task description. (Show Details)Jan 30 2019, 12:26 AM

Augmenting the above result:

  • user pages: 4%
  • user talk pages: 2%
  • user contributions: 11%
SELECT SUM(IF(uri_query LIKE '%action=history%',1,0))/SUM(1) AS history,
SUM(IF(uri_query LIKE '%diff=%',1,0))/SUM(1) AS diff,
SUM(IF((uri_query LIKE '%oldid=%') AND NOT (uri_query LIKE '%diff=%'),1,0))/SUM(1) AS old_revision, 
SUM(IF((uri_path LIKE '/wiki/%') AND (uri_path NOT LIKE '%Special:%'),1,0))/SUM(1) AS nonspecial_pageview,
SUM(IF(uri_path LIKE '/wiki/User:%',1,0))/SUM(1) AS userpage,
SUM(IF(uri_path LIKE '/wiki/User_talk:%',1,0))/SUM(1) AS usertalkpage,
SUM(IF(uri_path LIKE '/wiki/Special:Contributions%',1,0))/SUM(1) AS contributions,
SUM(1) AS all_clickthroughs
FROM wmf.webrequest
WHERE year = 2019 AND month = 1 AND day = 28 
AND pageview_info['project'] = 'en.wikipedia'
AND agent_type = 'user'
AND referer LIKE '%action=history%';


history	diff	old_revision	nonspecial_pageview	userpage	usertalkpage	contributions	all_clickthroughs
0.12916194871647865	0.3149034671559002	0.15582135332310912	0.2538161647723201	0.04005360546947785	0.02335564505726039	0.10802075420315613	139538
Time taken: 508.294 seconds, Fetched: 1 row(s)

The analogon of the above result for logged-in users:

  • diff links: ca. 43%
  • All non-special pages (including e.g. user pages): ca. 23%
  • user contributions: 13%
  • old revisions: ca. 10%
  • other action=history views: ca. 6%
  • user pages: 4%
  • user talk pages: 3%
SELECT SUM(IF(uri_query LIKE '%action=history%',1,0))/SUM(1) AS history,
SUM(IF(uri_query LIKE '%diff=%',1,0))/SUM(1) AS diff,
SUM(IF((uri_query LIKE '%oldid=%') AND NOT (uri_query LIKE '%diff=%'),1,0))/SUM(1) AS old_revision, 
SUM(IF((uri_path LIKE '/wiki/%') AND (uri_path NOT LIKE '%Special:%'),1,0))/SUM(1) AS nonspecial_pageview,
SUM(IF(uri_path LIKE '/wiki/User:%',1,0))/SUM(1) AS userpage,
SUM(IF(uri_path LIKE '/wiki/User_talk:%',1,0))/SUM(1) AS usertalkpage,
SUM(IF(uri_path LIKE '/wiki/Special:Contributions%',1,0))/SUM(1) AS contributions,
SUM(1) AS all_clickthroughs
FROM wmf.webrequest
WHERE year = 2019 AND month = 1 AND day = 28 
AND x_analytics_map['loggedIn'] IS NOT NULL
AND referer LIKE '%action=history%'
AND pageview_info['project'] = 'en.wikipedia'
AND agent_type = 'user';

history	diff	old_revision	nonspecial_pageview	userpage	usertalkpage	contributions	all_clickthroughs
0.06321179060197148	0.42847800427473126	0.10079114428812964	0.22973490286151785	0.03655852234663604	0.028471624078859222	0.13219765846811496	62694
Time taken: 308.206 seconds, Fetched: 1 row(s)
  • Special:Log
  • Special:AbuseLog
+------+------------+--------------------+
| log  | abuse_log  | all_clickthroughs  |
+------+------------+--------------------+
| 0.1  | 0.02       | 960961             |
select
    round(sum(if(uri_query like '%Special:Log%', 1, 0)) / sum(1) * 100, 2) as log,
    round(sum(if(uri_query like '%Special:AbuseLog%', 1, 0)) / sum(1) * 100, 2) as abuse_log,
    sum(1) as all_clickthroughs
from
    wmf.webrequest
where
    year = 2019
    and month = 1
    and day = 28

    and agent_type = 'user'

    and normalized_host.project_family = 'wikipedia'
    and normalized_host.project = 'en'

    and referer like '%action=history%'
;
Tbayer moved this task from Triage to Doing on the Product-Analytics board.Feb 14 2019, 7:32 PM
Tbayer claimed this task.
Aklapper removed Tbayer as the assignee of this task.Mon, Apr 22, 11:49 AM
Aklapper added a subscriber: Aklapper.

Resetting task assignee as the user is not active here anymore. Please reassign if priority "high" is still correct.