Page MenuHomePhabricator

Generate section image recommendations for Vietnamese Wikipedia and other wikis
Closed, ResolvedPublic

Description

In T345940: Section-level "add an image" task: Scale to all Wikipedias that have the Article-level "add an image" task, Growth-Team wants to deploy section image recommendations for all Wikipedias where article-level recommendation is enabled. During pre-deployment investigation (T345940#9176173), we discovered that Vietnamese Wikipedia has no section recommendation images: the search query yields zero results. Why is that?

During follow-up investigation, we discovered this happens for some other wikis as well. Complete data (as visible to GrowthExperiments) are available at https://docs.google.com/spreadsheets/d/1Q0zHAS1Duq41MYPUPOSJwkZYVOX8y_ZJZ9Wd9knZjbA/edit#gid=0 (both section and article-level recommendations). I generated the spreadsheet via this bash sequence:

urbanecm@wmf3345 tmp % get_section_tasks() {
curl -s "https://$1/w/api.php?action=query&format=json&list=search&formatversion=2&srsearch=hasrecommendation%3Aimage_section%20-hasrecommendation%3Aimage" | jq .query.searchinfo.totalhits | tr -d '\n'
}
urbanecm@wmf3345 tmp % get_article_tasks() {
function> curl -s "https://$1/w/api.php?action=query&format=json&list=search&formatversion=2&srsearch=hasrecommendation%3Aimage" | jq .query.searchinfo.totalhits | tr -d '\n' 
function> }
urbanecm@wmf3345 tmp % while read wiki; do echo -en "$wiki\t"; get_section_tasks $wiki; echo -en "\t"; get_article_tasks $wiki; echo; done < wikis.txt | tee all-wiki-results.txt
[...]
urbanecm@wmf3345 tmp %

Why is this happening? Can we add suggestions to the missing wikis in some way? For now (September 2023), this is most important for the Vietnamese Wikipedia, since that is one of the first wikis Growth intends (intended?) to deploy section recommendations to.

Acceptance Criteria
  • More Wikipedia versions have section level image recommendations (similar coverage to what per-article image recommendation has)

Event Timeline

@Urbanecm_WMF , T343844: NEW BUG REPORT fiwiki’s section-level image suggestions aren’t generated in production looks fine now in terms of data.
Here's the section-level image suggestions (SLIS) current count for Vietnamese Wikipedia:

In [1]: isu = spark.read.table('analytics_platform_eng.image_suggestions_suggestions').where('snapshot="2023-09-11"')
In [2]: slis = isu.where(isu.section_index.isNotNull())
In [3]: slis.where(slis.wiki == 'viwiki').count()
Out[3]: 38210

Re the acceptance criterion of this ticket: we have at least 1 SLIS for 293 Wikipedias, not all. Here's the complete list with SLIS raw counts (sorry for the long paste!). You may want to skip Wikipedias with not enough SLIS.

In [4]: slis.select(slis.wiki).distinct().count()
Out[4]: 293
In [5]: slis.groupBy(slis.wiki).count().orderBy('count', ascending=False).show(n=300)
wikicount
enwiki237706
dewiki129877
ruwiki128102
frwiki124932
eswiki123535
itwiki118022
ukwiki95136
plwiki90826
nlwiki79564
huwiki74956
hewiki74620
cswiki68765
nowiki62103
cawiki62063
jawiki60551
ptwiki58337
svwiki57341
zhwiki56971
arwiki56372
fiwiki55348
srwiki52096
be_x_oldwiki48349
trwiki41341
elwiki41114
idwiki39665
bgwiki39489
viwiki38210
hrwiki36195
rowiki34542
shwiki33347
dawiki31348
simplewiki30523
skwiki30279
fawiki28997
hywiki24828
kowiki24236
mkwiki23015
glwiki22539
mswiki22387
lvwiki21521
bewiki21180
azwiki20708
slwiki19090
bswiki19047
eowiki18518
astwiki18457
euwiki18347
etwiki18034
ltwiki17549
bnwiki14227
kkwiki11855
thwiki11848
afwiki11295
hiwiki10680
tawiki10131
kawiki9685
nnwiki9249
sqwiki8961
alswiki8678
mlwiki7545
uzwiki7021
fywiki5864
jvwiki5744
urwiki5573
iswiki5193
lawiki4765
ocwiki4093
pnbwiki3859
knwiki3743
tewiki3707
cywiki3596
bawiki3498
suwiki3462
tlwiki3326
pawiki3279
scowiki2911
lmowiki2717
mrwiki2715
mnwiki2594
ttwiki2504
swwiki2373
anwiki2357
tgwiki2330
zh_classicalwiki2218
gawiki2204
ndswiki2186
zh_yuewiki2160
guwiki2150
pswiki2077
liwiki2061
newiki2039
mywiki1934
roa_tarawiki1927
lbwiki1851
arzwiki1806
hywwiki1707
mtwiki1668
kywiki1508
siwiki1252
bhwiki1220
stqwiki1153
kuwiki1130
aswiki1047
scnwiki952
cowiki907
vecwiki882
iawiki881
mgwiki877
orwiki869
minwiki791
mwlwiki755
nds_nlwiki739
brwiki711
ruewiki644
wuuwiki634
azbwiki633
sawiki630
kmwiki609
iowiki569
vepwiki532
krcwiki484
mznwiki462
ckbwiki437
yiwiki428
hawiki428
sowiki413
gdwiki406
sahwiki400
koiwiki393
rmwiki383
novwiki379
cebwiki371
scwiki339
barwiki333
bclwiki332
vlswiki321
ladwiki306
sdwiki288
tkwiki284
pmswiki279
cvwiki276
xmfwiki273
cewiki262
fiu_vrowiki254
hifwiki254
frrwiki253
ilowiki248
fowiki241
map_bmswiki239
maiwiki236
extwiki225
napwiki207
oswiki205
tcywiki205
warwiki201
dagwiki182
bxrwiki175
lezwiki174
iewiki163
igwiki159
furwiki140
bat_smgwiki133
newwiki125
htwiki123
altwiki117
hsbwiki113
arywiki109
lowiki108
dtywiki108
diqwiki105
zeawiki103
gomwiki102
zh_min_nanwiki92
gvwiki85
cbk_zamwiki85
tumwiki81
niawiki79
wawiki76
awawiki75
nsowiki73
banwiki73
avwiki72
gorwiki71
dvwiki71
snwiki69
mrjwiki69
bjnwiki63
skrwiki63
ganwiki60
papwiki59
szlwiki58
madwiki57
fjwiki56
avkwiki55
szywiki55
lldwiki54
kswiki52
gnwiki50
lijwiki50
shnwiki48
smnwiki47
lfnwiki47
pcdwiki46
glkwiki45
inhwiki45
kaawiki45
emlwiki44
pflwiki42
sewiki42
kbdwiki37
dsbwiki36
zawiki36
pcmwiki35
wowiki34
myvwiki34
xhwiki31
smwiki31
gurwiki30
kbpwiki30
gcrwiki30
ugwiki29
tyvwiki29
nrmwiki29
amwiki28
satwiki27
udmwiki27
nqowiki27
pamwiki26
kwwiki25
amiwiki24
yowiki24
hawwiki23
shiwiki23
frpwiki22
bpywiki22
mhrwiki22
acewiki20
trvwiki20
kabwiki19
roa_rupwiki19
crhwiki17
twwiki17
quwiki17
csbwiki16
zuwiki16
bowiki15
lnwiki14
omwiki14
kshwiki14
pntwiki13
mnwwiki12
nywiki11
kvwiki11
tywiki11
blkwiki11
xalwiki10
vowiki10
rwwiki9
ltgwiki9
bugwiki8
hakwiki8
pdcwiki8
nahwiki8
atjwiki8
sswiki7
miwiki5
olowiki5
tetwiki5
jamwiki5
ffwiki4
lbewiki4
bmwiki4
tpiwiki4
tiwiki3
cuwiki3
cdowiki3
lgwiki3
arcwiki3
guwwiki2
gagwiki2
mniwiki2
pagwiki2
angwiki2
gucwiki2
gotwiki1
adywiki1
tswiki1
tnwiki1
taywiki1
mdfwiki1
kcgwiki1
kiwiki1
vewiki1
Urbanecm_WMF assigned this task to mfossati.

@Urbanecm_WMF , T343844: NEW BUG REPORT fiwiki’s section-level image suggestions aren’t generated in production looks fine now in terms of data.
Here's the section-level image suggestions (SLIS) current count for Vietnamese Wikipedia:

In [1]: isu = spark.read.table('analytics_platform_eng.image_suggestions_suggestions').where('snapshot="2023-09-11"')
In [2]: slis = isu.where(isu.section_index.isNotNull())
In [3]: slis.where(slis.wiki == 'viwiki').count()
Out[3]: 38210

Thanks! This sounds perfect. I can also see the recommendations in the search index.

Re the acceptance criterion of this ticket: we have at least 1 SLIS for 293 Wikipedias, not all. Here's the complete list with SLIS raw counts (sorry for the long paste!). You may want to skip Wikipedias with not enough SLIS.

That was a poorly written criterion on my part; it was meant as a "similar coverage". For the smallest Wikipedias, it's definitely going to be challenging to get some recommendations. I think 293 is good enough. Thanks for the quick help here! I think this is good to resolve from Growth's PoV.