Page MenuHomePhabricator

What % of pages feature issues?
Closed, ResolvedPublic

Description

We need to know what percentage of pages feature page issues per https://phabricator.wikimedia.org/T200792#4472739

This will inform us on a sampling rate for T200792 and be useful in reporting at the end of the page issues project.

Developer notes

Some ideas that come to mind:

  1. Use a database dump
  2. Download a dump of Wikipedia
  3. Count templates which contain an ambox class
  4. Count articles that include those templates
  1. Use MobileFormatter

When a page is rendered check for an ambox class
If present, increment a stat in statsv
Set a page property for every page where this is done to avoid counting it twice

  1. Count template transclusions

Identify all templates that can render ambox class (Special:Search can help here)
For each template, check corresponding template count https://tools.wmflabs.org/templatecount/index.php?lang=en&namespace=10&name=Ambox#bottom
Note: this approach would lead to duplicates where more than one template is used in the same page.

Event Timeline

Jdlrobson created this task.Aug 3 2018, 2:11 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 3 2018, 2:11 AM

As discussed at T200792#4475856, there seems to be some confusion here between two related but separate questions:

  1. the ratio of *pages* with issues among all pages
  2. the ratio of *pageviews* to pages with issues, among all pageviews

For example, suppose a wiki has two pages, one with issues and one without. The first page gets 8 views, and the second page gets 2 views. Then the answer to question 1 would be "50%", the answer to the second question would be "80%".

From the solutions proposed in the task, and the comment at T200792#4475276 ("I would not rely on anything page view based to count this"), I assume that this task is about question 1, so I'm going to correct the task title accordingly.

Tbayer renamed this task from What % of page views feature page issues? to What % of pages feature issues?.Aug 3 2018, 12:55 PM

Count template transclusions
Identify all templates that can render ambox class (Special:Search can help here)
For each template, check corresponding template count https://tools.wmflabs.org/templatecount/index.php?lang=en&namespace=10&name=Ambox#bottom
Note: this approach would lead to duplicates where more than one template is used in the same page.

Actually I think that this tool may count nested transclusions too - it appears that it simply executes the following query:

SELECT count(*) FROM templatelinks WHERE tl_title = '...' AND tl_namespace = "....";

and that the templatelinks table includes nested transclusions.

So, as a rough estimate (comparing the output of the tool with https://en.wikipedia.org/wiki/Special:Statistics and assuming e.g. that most transclusions of Template:Ambox - nested or not - happen in the article namespace, ) we would get that 23% of mainspace pages on English Wikipedia use Ambox or a derived template, or 2.8% of all pages including non-mainspace.

That would be consistent with this academic research result from a couple of years ago:

"A paper titled 'A Breakdown of Quality Flaws in Wikipedia'[7] examines cleanup tags on the English Wikipedia (using a January 2011 dump), finding that 27.53% of articles are tagged with at least one of altogether 388 different cleanup templates. "

Is it possible that a page contains the ambox class without transcluding the Ambox template (or a template that uses the Ambox template in turn)?

> As discussed at T200792#4475856, there seems to be some confusion here between two related but separate questions:
Yes, it seems I might be confused. Do we need to know the answer to the question "What % of pages feature issues?" If not, let's decline this task!

> As discussed at T200792#4475856, there seems to be some confusion here between two related but separate questions:
Yes, it seems I might be confused. Do we need to know the answer to the question "What % of pages feature issues?" If not, let's decline this task!

Not for the purpose of estimating the sampling ratio needed, which I have now done (T200792#4489268) using the above result for enwiki as a rough estimate.

It wouldn't be too hard to extend that result (T201123#4476734, about the ratio of pages who have a template named "Ambox" transcluded) across all wikis, as it is based on a simple database query. I would still be happy to do that. But this task is no longer a blocker for T200792: [EPIC] Run A/B test on page issues (Farsi, Japanese, Russian, English).

Jdlrobson added a subscriber: ovasileva.

But this task is no longer a blocker for T200792: Run A/B test on page issues.

While I agree it would be useful, if we don't need this for reporting, let's decline this! We already have enough work on our plates. @ovasileva what do you think?

Discussed this with @Tbayer briefly. If it's a matter of a single query with no additional work from the engineers needed, I would say let's go ahead and do it. That said, it is not necessary for the A/B test itself or the posed research questions.

TheDJ added a subscriber: TheDJ.EditedAug 10 2018, 2:48 PM

https://quarry.wmflabs.org/query/28877
1,549,538 articles in en.wp feature usage of a (1 or more) Module:Message_box (the meta template for all Template:*mbox'es)
https://quarry.wmflabs.org/query/28878
1,076,899 articles in en.wp feature usage of a Template:Ambox

https://quarry.wmflabs.org/query/28879
There are currently 5695246 articles on en.wp (redirects excluded)

1,549,538/5,695,246*100 = 27%
So 1,076,899/5,695,246*100 = 19%

https://en.wikipedia.org/wiki/Module:Message_box
For all namespace, a Module:Message_box is used on 6,323,224, which is 14%

That means there is a considerable amount of page issues that is NOT Template:ambox (not sure why actually). There are some templates which use this mbox styling without making use of one of the meta templates.

Note that the Content pages count on Special:Statistics (and in Special:Random) excludes a lot of pages (redirects, stubs etc) and on some wikis can include multiple namespaces.

ovasileva triaged this task as Normal priority.Aug 21 2018, 8:06 AM
Restricted Application added a project: Product-Analytics. · View Herald TranscriptAug 21 2018, 8:06 AM
nettrom_WMF moved this task from Triage to Doing on the Product-Analytics board.Aug 23 2018, 8:15 PM

https://quarry.wmflabs.org/query/28877
1,549,538 articles in en.wp feature usage of a (1 or more) Module:Message_box (the meta template for all Template:*mbox'es)
https://quarry.wmflabs.org/query/28878
1,076,899 articles in en.wp feature usage of a Template:Ambox
https://quarry.wmflabs.org/query/28879
There are currently 5695246 articles on en.wp (redirects excluded)
1,549,538/5,695,246*100 = 27%
So 1,076,899/5,695,246*100 = 19%
https://en.wikipedia.org/wiki/Module:Message_box
For all namespace, a Module:Message_box is used on 6,323,224, which is 14%

Thanks @TheDJ ! (also for taking care to only count distinct pages, as the query used by the templatecounts tool that the task description proposed to use for this question actually counts multiple template occurrences on the same page separate, cf. T201123#4476734.)

To be clear, the current work on redesigning the page issues on mobile (and the accompanying instrumentation) only affects Ambox-based templates, not other pages that use the Message box template.

That means there is a considerable amount of page issues that is NOT Template:ambox (not sure why actually). There are some templates which use this mbox styling without making use of one of the meta templates.

Looking at the articles that use Template:Message_box directly (only a small part of those that use it overall, of course), it seems it is being employed for non-issues as well, such as designing sports tables: https://en.wikipedia.org/wiki/2014_Indonesia_Super_League

On Latvian Wikipedia, the ratio of pages with issues is around 10%: https://quarry.wmflabs.org/query/29838 (using the above approach to count Ambox-using pages, adapting @TheDJ's queries and combining them into one)

Tbayer added a comment.EditedSep 28 2018, 9:50 PM

On the Persian Wikipedia, the ratio of pages with (Ambox) issues is around 5%: https://quarry.wmflabs.org/query/30030
[edited to fix link]

To wrap this up, I extended the above queries for all Wikipedias (using a PAWS notebook).

There are 303 of them according to the current sitematrix (including a few closed ones), of those, 144 have >0 articles that use the Ambox template.
With 90%, the Cebuano Wikipedia has the highest ratio of Ambox-using articles, probably because of its template for bot-generated articles. Swedish Wikipedia has a remarkably high ratio with 74% as well.

wikiarticles_with_Amboxall_articlesAmbox_ratio
aawiki020.0000
abwiki1435240.0040
acewiki59379960.0742
adywiki05470.0000
afwiki14936719970.2075
akwiki06300.0000
alswiki0254630.0000
amwiki0150830.0000
anwiki0343900.0000
angwiki5131370.0163
arwiki346693160.0001
arcwiki016650.0000
arzwiki22202820.0011
aswiki131953550.2463
astwiki0996070.0000
atjwiki010070.0000
avwiki7524170.0310
aywiki046430.0000
azwiki86801444340.0601
azbwiki1911345940.0014
bawiki20821467040.4458
barwiki0277860.0000
bat_smgwiki0166900.0000
bclwiki3789520.0041
bewiki27981641380.0170
be_x_oldwiki0660490.0000
bgwiki427442496910.1712
bhwiki3767230.0055
biwiki113480.0007
bjnwiki821900.0037
bmwiki07290.0000
bnwiki6026650030.0927
bowiki0113010.0000
bpywiki0252680.0000
brwiki0663660.0000
bswiki20354794320.2562
bugwiki0141520.0000
bxrwiki021410.0000
cawiki06050940.0000
cbk_zamwiki031250.0000
cdowiki5142530.0004
cewiki23342090710.0112
cebwiki484327153796070.9003
chwiki04980.0000
chowiki0140.0000
chrwiki09450.0000
chywiki07720.0000
ckbwiki2514230710.1090
cowiki056920.0000
crwiki01180.0000
crhwiki063190.0000
cswiki04227490.0000
csbwiki052860.0000
cuwiki06770.0000
cvwiki19188419390.4575
cywiki2471027970.0024
dawiki12432140.0000
dewiki022739430.0000
dinwiki02260.0000
diqwiki20194200.0213
dsbwiki41032290.1270
dtywiki9633110.0290
dvwiki3342010.0079
dzwiki02900.0000
eewiki03700.0000
elwiki286261588970.1802
emlwiki0122500.0000
enwiki108313158102760.1864
eowiki02549490.0000
eswiki014565500.0000
etwiki238541870540.1275
euwiki03251400.0000
extwiki031790.0000
fawiki365666666260.0549
ffwiki02980.0000
fiwiki04522830.0000
fiu_vrowiki055450.0000
fjwiki05080.0000
fowiki0129350.0000
frwiki020830200.0000
frpwiki032960.0000
frrwiki085710.0000
furwiki034200.0000
fywiki0415200.0000
gawiki28088506030.5551
gagwiki028860.0000
ganwiki7664920.0117
gdwiki17148840.0011
glwiki178761546060.1156
glkwiki476258830.8095
gnwiki036740.0000
gomwiki25942590.0608
gorwiki034380.0000
gotwiki07150.0000
guwiki383286020.0134
gvwiki4349910.0086
hawiki037690.0000
hakwiki390170.0003
hawwiki034440.0000
hewiki02385510.0000
hiwiki244471325700.1844
hifwiki36100380.0036
howiki040.0000
hrwiki01869200.0000
hsbwiki1121134160.0836
htwiki0559800.0000
huwiki474014448170.1066
hywiki193652522500.0768
hzwiki010.0000
iawiki10214360.0005
idwiki222574560610.0488
iewiki043960.0000
igwiki014550.0000
iiwiki0150.0000
ikwiki06470.0000
ilowiki67117070.0057
inhwiki78360.0084
iowiki0287260.0000
iswiki0465310.0000
itwiki015063050.0000
iuwiki05140.0000
jawiki23383111398600.2051
jamwiki016910.0000
jbowiki013010.0000
jvwiki2177554860.0392
kawiki32401271450.0255
kaawiki2020470.0098
kabwiki134070.0003
kbdwiki2516120.0155
kbpwiki017060.0000
kgwiki012410.0000
kiwiki014830.0000
kjwiki060.0000
kkwiki987122279770.4330
klwiki017130.0000
kmwiki28491350.0311
knwiki2757246290.1119
kowiki193204464120.0433
koiwiki134970.0003
krwiki010.0000
krcwiki1220520.0058
kswiki04010.0000
kshwiki028490.0000
kuwiki2220248620.0893
kvwiki054540.0000
kwwiki038670.0000
kywiki723803450.0090
lawiki01300730.0000
ladwiki038870.0000
lbwiki0558090.0000
lbewiki012440.0000
lezwiki739770.0018
lfnwiki037120.0000
lgwiki023620.0000
liwiki0123630.0000
lijwiki036330.0000
lmowiki0381910.0000
lnwiki031960.0000
lowiki24040830.0588
lrcwiki055700.0000
ltwiki1271920770.0007
ltgwiki29170.0022
lvwiki9186911050.1008
maiwiki116141920.0082
map_bmswiki190135440.0140
mdfwiki013410.0000
mgwiki13909730.0001
mhwiki090.0000
mhrwiki58101320.0057
miwiki071730.0000
minwiki802227220.0004
mkwiki2656992950.0267
mlwiki5325631250.0844
mnwiki148208340.0071
mrwiki14534740.0003
mrjwiki0105410.0000
mswiki61143245900.0188
mtwiki56834010.1670
muswiki030.0000
mwlwiki16539850.0414
mywiki1066460970.0231
myvwiki155060.0002
mznwiki27131130.0021
nawiki013180.0000
nahwiki071060.0000
napwiki0146580.0000
ndswiki0430060.0000
nds_nlwiki068320.0000
newiki9749335100.2909
newwiki29729120.0004
ngwiki0210.0000
nlwiki019584970.0000
nnwiki01422390.0000
nowiki7325029840.0015
novwiki017800.0000
nrmwiki040490.0000
nsowiki080820.0000
nvwiki072030.0000
nywiki05480.0000
ocwiki0855900.0000
olowiki231450.0006
omwiki010560.0000
orwiki1431146620.0976
oswiki19113670.0017
pawiki1265328400.0385
pagwiki050430.0000
pamwiki16987380.0193
papwiki621340.0028
pcdwiki044770.0000
pdcwiki020280.0000
pflwiki025210.0000
piwiki031950.0000
pihwiki07480.0000
plwiki4261113208390.0323
pmswiki0644300.0000
pnbwiki3478730.0001
pntwiki05080.0000
pswiki302107870.0280
ptwiki15162210171560.1491
quwiki0212530.0000
rmwiki036230.0000
rmywiki06930.0000
rnwiki06960.0000
rowiki32953921800.0084
roa_rupwiki012480.0000
roa_tarawiki092550.0000
ruwiki21868915300890.1429
ruewiki170370.0001
rwwiki019560.0000
sawiki761114440.0665
sahwiki1556133150.1169
satwiki110190.0010
scwiki1362080.0021
scnwiki0263340.0000
scowiki517537080.0096
sdwiki89132870.0067
sewiki076250.0000
sgwiki02810.0000
shwiki51904484320.0116
shnwiki5139670.0129
siwiki982191740.0512
simplewiki87851431450.0614
skwiki02286770.0000
slwiki01637090.0000
smwiki09570.0000
snwiki046670.0000
sowiki9767500.0144
sqwiki92834140.0011
srwiki273096158840.0443
srnwiki011580.0000
sswiki04900.0000
stwiki06090.0000
stqwiki040650.0000
suwiki1678401830.0418
svwiki279123237547890.7434
swwiki797483810.0165
szlwiki64479470.0810
tawiki101091266900.0798
tcywiki312720.0024
tewiki7714704700.1095
tetwiki015470.0000
tgwiki28567969540.2946
thwiki259791300970.1997
tiwiki03050.0000
tkwiki067580.0000
tlwiki6270768710.0816
tnwiki07530.0000
towiki017000.0000
tpiwiki515220.0033
trwiki297133236070.0918
tswiki06790.0000
ttwiki28245833080.3390
tumwiki06380.0000
twwiki06970.0000
tywiki012810.0000
tyvwiki1421210.0066
udmwiki1146300.0024
ugwiki059820.0000
ukwiki925458885050.1042
urwiki51811454520.0356
uzwiki12241306830.0094
vewiki13380.0030
vecwiki0113590.0000
vepwiki359010.0005
viwiki263812030510.0022
vlswiki067980.0000
vowiki01226000.0000
wawiki0152900.0000
warwiki21412636940.0002
wowiki013340.0000
wuuwiki43173810.0025
xalwiki023110.0000
xhwiki010110.0000
xmfwiki50127730.0039
yiwiki0147420.0000
yowiki63319940.0020
zawiki020330.0000
zeawiki046390.0000
zhwiki15236410459970.1457
zh_classicalwiki66688780.0750
zh_min_nanwiki44732278350.0196
zh_yuewiki3679711850.0517
zuwiki011800.0000

(Data source)

ovasileva closed this task as Resolved.Feb 19 2019, 8:11 PM

Resolving this for now.

PS: Keep in mind that the above data is, as stated, about usage of templates named "Ambox". Some wikis generate the "Ambox" class name manually instead from differently named template (e.g. itwiki, see template source, example article) and will thus be affected/improved by the new design even they show up with 0% in the table above.