Do readers use categories, or just editors?
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Whatamidoing-WMF
	Dec 4 2018, 6:54 PM

Description

The idea behind categories on Wikipedia articles is that readers and editors will be able to use category pages to find encyclopedia articles that interest them. Every time we talk about a category-related project, I wonder: Do normal, non-editing readers actually use those to find Wikipedia articles?

My current feeling is that they (mostly) don't. They're not visible on mobile, but I've never seen any complaints about them being inaccessible there; if they were widely used, then someone would have complained about the loss of functionality.

It should be possible to use page views for the content categories (excluding hidden/maintenance categories), and comparing expected vs actual page views for logged-in and logged-out users to come up with an approximate answer to my question.

Related Objects

Mentioned In: T337983: [Spike] Investigate proposals "Display the categories on mobile site for everyone"
T340606: Decide what to do about the edit-form's hidden categories list
T211195: [Spike 16hrs] Investigate opt-in audience and instrumentation

Event Timeline

• Whatamidoing-WMF created this task.Dec 4 2018, 6:54 PM

Here is a quick, partial answer for enwiki:

There is some truth to the hypothesis that category pages are more popular among logged-in users than article pages, in that the percentage of logged-in views is several times higher for the former. On the other hand, the vast majority of views to category pages still comes from anons.

Concretely, category pages have about 3% logged-in views compared to 0.8% for mainspace pages, but on the other hand that's still way lower than say for user talk pages (28% logged-in views). I generated the percentages for all namespaces on enwiki below while I was at it. See https://en.wikipedia.org/wiki/Wikipedia:Namespace for the numerical namespace IDs (category pages have ID 14).

Another thing to keep in mind is that category pages still receive vastly less pageviews than articles (less than 1/100th, in the data below).

There are various directions one could explore from here:

For large categories that are paginated, it's possible to determine how often the subsequent pages are accessed, as a check how much users are interested in the full content of such a large category, as opposed to just randomly clicking the category links at the bottom of articles. (E.g. the "next page" link on https://en.wikipedia.org/wiki/Category:2018_singles currently leads to https://en.wikipedia.org/w/index.php?title=Category:2018_singles&pagefrom=Disillusioned#mw-pages ).
It is possible in principle to exclude maintenance categories, as you suggested in the task, but that would require much more work than the quick query here.
Another way to answer the question from the task whether users use categories to find Wikipedia articles is to look at the number of article pageviews with a category page as referrer.

As mentioned earlier, I unfortunately don't have a lot extra bandwidth right now and thus won't be to tackle much of these anytime soon, although I might be able to run a query for 3. later this month, after completing a similar request for portals.

namespace_id	loggedin_percentage	all_views
NULL	0.0	17857663
-1	6.23	23937764
0	0.83	1684955354
1	8.36	2694181
2	17.17	1973589
3	28.49	882943
4	12.94	3500701
5	18.06	251068
6	1.78	9854318
7	4.55	14563
8	13.15	16071
9	20.12	3818
10	9.28	1154212
11	1.31	711601
12	4.4	775682
13	23.02	4987
14	2.9	10562814
15	3.4	161503
100	2.74	1137614
101	6.89	6820
108	1.67	32139
109	6.32	1455
118	61.8	112365
119	69.92	3132
710	4.85	1526
711	43.75	16
828	19.98	24956
829	40.29	968
2300	40.0	5
2301	0.0	2

(Data for November 24-30, 2018, known bots excluded. NULL values comes from pageviews that didn't record the namespace, which IIRC encompasses the mobile apps. Otherwise the above data doesn't distinguish between mobile and desktop.)

Data via

SELECT namespace_id, -- cf. https://en.wikipedia.org/wiki/Wikipedia:Namespace 
ROUND( 100 * SUM(IF(x_analytics_map['loggedIn'] IS NOT NULL,1,0)) / SUM(1), 2) AS loggedin_percentage,
SUM(1) AS all_views
FROM wmf.webrequest
WHERE year = 2018 AND month = 11 AND day >= 24
  AND is_pageview
  AND pageview_info['project'] = 'en.wikipedia'
  AND agent_type = 'user'
GROUP BY namespace_id
ORDER BY namespace_id LIMIT 10000;

Thanks! This is great. I'm happy.

• Tbayer moved this task from Triage to Doing on the Product-Analytics board.Dec 6 2018, 9:21 PM

Cool! I'll close this for now; might reopen it in case I get to look at 3. above later.

pmiazga mentioned this in T211195: [Spike 16hrs] Investigate opt-in audience and instrumentation.Dec 13 2018, 2:17 AM

TheDJ mentioned this in T340606: Decide what to do about the edit-form's hidden categories list.Jun 28 2023, 9:35 AM

KSiebert mentioned this in T337983: [Spike] Investigate proposals "Display the categories on mobile site for everyone".Jun 30 2023, 9:18 AM

Noting for whomever may come across this: I suspect the above findings differ to some degree from present day. Since late 2020, the pageviews pipeline more reliably filters out bot traffic (ref). These bots crawl the wikis clicking on every link, so it seems highly probable the all_views figure above was partly automated traffic.

I tried running it myself on stat1007 with the following, and other variations of this query, and it always errored out with FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Java heap space. I tried giving it more RAM, etc., but no luck yet.

SELECT ROUND( 100 * SUM(IF(x_analytics_map['loggedIn'] IS NOT NULL,1,0)) / SUM(1), 2) AS loggedin_percentage,
  SUM(1) AS all_views
FROM wmf.webrequest
WHERE year = 2023 AND month = 6 AND day >= 15
  AND is_pageview
  AND pageview_info['project'] = 'en.wikipedia'
  AND agent_type = 'user'
  AND namespace_id = 14
LIMIT 10000;

Do readers use categories, or just editors?Closed, ResolvedPublicActions

Description

Related Objects

Event Timeline

Do readers use categories, or just editors?
Closed, ResolvedPublic
Actions