Page MenuHomePhabricator

Examine clickthrough ratios for different page elements of action=history pages
Closed, DeclinedPublic

Description

We would like to estimate the popularity of different kinds of links on desktop history pages like https://en.wikipedia.org/w/index.php?title=Antlia&action=history , including:

  • diff links (cur, prev, Compare selected revisions etc.)
  • old revisions (e.g. https://en.wikipedia.org/w/index.php?title=Antlia&oldid=880744665 )
  • other action=history views e.g. https://en.wikipedia.org/w/index.php?title=Antlia&offset=20170119050717&action=history )
    • by number of items (20 | 50 | 100 | 250 | 500)
  • User pages
  • User talk pages
  • All non-special pages
  • User contribution pages
  • undo
  • (for logged-in views): Thanks
    • Probably not possible to determine from webrequest data alone. (The Special:Thanks link in each revision entry is caught/suppressed in JS, triggering the revealing of the Thanks confirmation link, which in turn only sends a POST API request whose parameters won't be recorded in webrequest .)
  • Special:Log? (View logs for this page)
  • Special:AbuseLog? (view filter log)
  • Show/hide revisions
  • Edit change tags
  • ...

See also https://meta.wikimedia.org/wiki/Schema:ChangesListClickTracking

Event Timeline

+----------+------------+--------+
|  limit   | logged_in  |   n    |
+----------+------------+--------+
| 500      | true       | 1104   |
| 250      | true       | 96     |
| 100      | true       | 82     |
| 50       | true       | 10     |
| 20       | true       | 7      |
| 3000     | true       | 5      |
| 5000     | true       | 3      |
| 1000     | true       | 2      |
| 2000     | true       | 2      |
| 41       | true       | 1      |
| 14       | true       | 1      |
| 1500     | true       | 1      |
| 9        | true       | 1      |
| 23       | true       | 1      |
| 24       | true       | 1      |
| 33       | true       | 1      |
+----------+------------+--------+
| 500      | false      | 37385  |
| 100      | false      | 411    |
| 250      | false      | 339    |
| 20       | false      | 307    |
| 50       | false      | 181    |
| 2000     | false      | 34     |
| 1000     | false      | 32     |
| 1        | false      | 31     |
| 2        | false      | 17     |
| 10000    | false      | 16     |
| 5000     | false      | 13     |
| default  | false      | 12     |
| 3        | false      | 12     |
| 200      | false      | 12     |
| 5        | false      | 10     |
| 4        | false      | 4      |
| 50000    | false      | 3      |
| 11       | false      | 2      |
| 800      | false      | 2      |
| 6        | false      | 2      |
| 3500     | false      | 1      |
| 232      | false      | 1      |
| 187      | false      | 1      |
| 10       | false      | 1      |
| 7        | false      | 1      |
| 3900     | false      | 1      |
| 93       | false      | 1      |
| 1500     | false      | 1      |
| 12       | false      | 1      |
+----------+------------+--------+
with dataset as (
    select
        parse_url(concat('https://', uri_host, uri_path, uri_query), 'QUERY', 'limit') as limit,
        if(x_analytics_map['loggedIn'] = '1', true, false) as logged_in
    from
        wmf.webrequest
    where
        year = 2019
        and month = 1
        and day = 28

        and agent_type = 'user'

        and normalized_host.project_family = 'wikipedia'
        and normalized_host.project = 'en'

        and uri_query like '%action=history%'
        and uri_query like '%limit=%'

        and http_status = 200
)

select
    if(limit is not null, limit, 'default') as limit,
    logged_in,
    count(*) as n
from
    dataset
group by
    logged_in,
    limit
sort by
    logged_in desc,
    n desc
limit 10000
;

A first quick-and-dirty result from one day of data:

  • diff links: ca. 32%
  • All non-special pages: ca. 25%
  • old revisions: ca. 16%
  • other action=history views: ca. 13%
SELECT SUM(IF(uri_query LIKE '%action=history%',1,0))/SUM(1) AS history,
SUM(IF(uri_query LIKE '%diff=%',1,0))/SUM(1) AS diff,
SUM(IF((uri_query LIKE '%oldid=%') AND NOT (uri_query LIKE '%diff=%'),1,0))/SUM(1) AS old_revision, 
SUM(IF((uri_path LIKE '/wiki/%') AND (uri_path NOT LIKE '%Special:%'),1,0))/SUM(1) AS nonspecial_pageview,
SUM(1) AS all_clickthroughs
FROM wmf.webrequest
WHERE year = 2019 AND month = 1 AND day = 28 
AND pageview_info['project'] = 'en.wikipedia'
AND agent_type = 'user'
AND referer LIKE '%action=history%';

history	diff	old_revision	nonspecial_pageview	all_clickthroughs
0.12833954955099167	0.3159065806534232	0.1571369115269485	0.2528693313697556	124942
Time taken: 253.591 seconds, Fetched: 1 row(s)
ovasileva moved this task from Incoming to Product Owner Backlog on the Readers-Web-Backlog board.

Augmenting the above result:

  • user pages: 4%
  • user talk pages: 2%
  • user contributions: 11%
SELECT SUM(IF(uri_query LIKE '%action=history%',1,0))/SUM(1) AS history,
SUM(IF(uri_query LIKE '%diff=%',1,0))/SUM(1) AS diff,
SUM(IF((uri_query LIKE '%oldid=%') AND NOT (uri_query LIKE '%diff=%'),1,0))/SUM(1) AS old_revision, 
SUM(IF((uri_path LIKE '/wiki/%') AND (uri_path NOT LIKE '%Special:%'),1,0))/SUM(1) AS nonspecial_pageview,
SUM(IF(uri_path LIKE '/wiki/User:%',1,0))/SUM(1) AS userpage,
SUM(IF(uri_path LIKE '/wiki/User_talk:%',1,0))/SUM(1) AS usertalkpage,
SUM(IF(uri_path LIKE '/wiki/Special:Contributions%',1,0))/SUM(1) AS contributions,
SUM(1) AS all_clickthroughs
FROM wmf.webrequest
WHERE year = 2019 AND month = 1 AND day = 28 
AND pageview_info['project'] = 'en.wikipedia'
AND agent_type = 'user'
AND referer LIKE '%action=history%';


history	diff	old_revision	nonspecial_pageview	userpage	usertalkpage	contributions	all_clickthroughs
0.12916194871647865	0.3149034671559002	0.15582135332310912	0.2538161647723201	0.04005360546947785	0.02335564505726039	0.10802075420315613	139538
Time taken: 508.294 seconds, Fetched: 1 row(s)

The analogon of the above result for logged-in users:

  • diff links: ca. 43%
  • All non-special pages (including e.g. user pages): ca. 23%
  • user contributions: 13%
  • old revisions: ca. 10%
  • other action=history views: ca. 6%
  • user pages: 4%
  • user talk pages: 3%
SELECT SUM(IF(uri_query LIKE '%action=history%',1,0))/SUM(1) AS history,
SUM(IF(uri_query LIKE '%diff=%',1,0))/SUM(1) AS diff,
SUM(IF((uri_query LIKE '%oldid=%') AND NOT (uri_query LIKE '%diff=%'),1,0))/SUM(1) AS old_revision, 
SUM(IF((uri_path LIKE '/wiki/%') AND (uri_path NOT LIKE '%Special:%'),1,0))/SUM(1) AS nonspecial_pageview,
SUM(IF(uri_path LIKE '/wiki/User:%',1,0))/SUM(1) AS userpage,
SUM(IF(uri_path LIKE '/wiki/User_talk:%',1,0))/SUM(1) AS usertalkpage,
SUM(IF(uri_path LIKE '/wiki/Special:Contributions%',1,0))/SUM(1) AS contributions,
SUM(1) AS all_clickthroughs
FROM wmf.webrequest
WHERE year = 2019 AND month = 1 AND day = 28 
AND x_analytics_map['loggedIn'] IS NOT NULL
AND referer LIKE '%action=history%'
AND pageview_info['project'] = 'en.wikipedia'
AND agent_type = 'user';

history	diff	old_revision	nonspecial_pageview	userpage	usertalkpage	contributions	all_clickthroughs
0.06321179060197148	0.42847800427473126	0.10079114428812964	0.22973490286151785	0.03655852234663604	0.028471624078859222	0.13219765846811496	62694
Time taken: 308.206 seconds, Fetched: 1 row(s)
  • Special:Log
  • Special:AbuseLog
+------+------------+--------------------+
| log  | abuse_log  | all_clickthroughs  |
+------+------------+--------------------+
| 0.1  | 0.02       | 960961             |
select
    round(sum(if(uri_query like '%Special:Log%', 1, 0)) / sum(1) * 100, 2) as log,
    round(sum(if(uri_query like '%Special:AbuseLog%', 1, 0)) / sum(1) * 100, 2) as abuse_log,
    sum(1) as all_clickthroughs
from
    wmf.webrequest
where
    year = 2019
    and month = 1
    and day = 28

    and agent_type = 'user'

    and normalized_host.project_family = 'wikipedia'
    and normalized_host.project = 'en'

    and referer like '%action=history%'
;
Aklapper added a subscriber: Aklapper.

Resetting task assignee as the user is not active here anymore. Please reassign if priority "high" is still correct.

ovasileva lowered the priority of this task from High to Low.Jun 23 2021, 10:08 AM

Removing task assignee due to inactivity, as this open task has been assigned for more than two years (see emails sent to assignee on May26 and Jun17, and T270544). Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be very welcome!

(See https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator.)

LGoto added a subscriber: LGoto.

This task was closed as part of backlog upkeep. If you believe it was closed in error, please respond on the ticket.