Page MenuHomePhabricator

Q2 FY2025-26 Goal: Deploy Add-a-link v2 models to production
Open, Needs TriagePublic

Assigned To
Authored By
OKarakaya-WMF
Oct 30 2025, 11:30 AM
Referenced Files
F70709718: image.png
Nov 28 2025, 11:16 AM
F70570186: image.png
Nov 24 2025, 10:54 AM
F69947254: image.png
Nov 6 2025, 9:45 AM
F69947248: image.png
Nov 6 2025, 9:45 AM
F69947238: image.png
Nov 6 2025, 9:45 AM
F69947228: image.png
Nov 6 2025, 9:45 AM

Description

As we have published all the new models. we want to deploy them to production as the next step to the previous goal
Please note that this goal is for deploying models and making them available for jobs.
As discussed, we have a separate goal for enabling wikis for onboarding tasks.

  • Add functionality to use new models to the inference service
  • service
  • deployment
  • Add functionality to load new models to database.
  • deployment job
  • Deploy a few new models for testing.
  • tiwiki, pwnwiki
  • Deploy new large wikis
  • zhwiki, jawiki and urwiki
  • Deploy all new models that are not currently in use.
  • See the list below
  • Deploy a few existing models based on the benchmarks. Monitor acceptance rates.
  • ltwiki: offline performance has been improved.
  • itwiki: offline performance is similar.
  • vowiki: offline performance is slightly lower but still above the threshold.
  • Deploy wikis that were enabled although they were under the release threshold in v1:
  • See the list below
  • Deploy enwiki. Monitor acceptance rates.
  • Deploy rest of the wikis that are currently being used. Monitor acceptance rates.
  • See the list below
  • Following 36 wikis in v2 do not exist in v1. I see we have released some v1 models that are under the release threshold.

lldwiki
guwwiki
awawiki
pcmwiki
wuuwiki
mywiki
gucwiki
smnwiki
zhwiki
dagwiki
kcgwiki
altwiki
gurwiki
taywiki
diqwiki
madwiki
shnwiki
mniwiki
fywiki
bowiki
blkwiki
skrwiki
urwiki
tiwiki
dvwiki
pwnwiki
jawiki
mnwwiki
szywiki
shiwiki
krcwiki
avkwiki
dtywiki
ganwiki
hywwiki
niawiki

  • Following wikis are common in both v1 and v2:

kswiki
lezwiki
bat_smgwiki
kbpwiki
kiwiki
lbwiki
mlwiki
xhwiki
eewiki
dewiki
pagwiki
pdcwiki
tawiki
ptwiki
simplewiki
kaawiki
chwiki
kkwiki
huwiki
furwiki
lawiki
bpywiki
roa_tarawiki
gcrwiki
maiwiki
cawiki
rmwiki
eswiki
nahwiki
nvwiki
udmwiki
anwiki
barwiki
bugwiki
be_x_oldwiki
srwiki
arywiki
bnwiki
enwiki
dinwiki
mrwiki
aywiki
zuwiki
wawiki
dsbwiki
scowiki
sqwiki
tlwiki
vecwiki
iowiki
pntwiki
kabwiki
ladwiki
rmywiki
skwiki
mrjwiki
gotwiki
iuwiki
ndswiki
guwiki
trwiki
swwiki
fiu_vrowiki
kuwiki
mhrwiki
pamwiki
kawiki
warwiki
olowiki
ltgwiki
jamwiki
cdowiki
jbowiki
lijwiki
wowiki
eowiki
bawiki
ilowiki
arzwiki
bswiki
cuwiki
cswiki
gdwiki
roa_rupwiki
afwiki
astwiki
nrmwiki
yiwiki
nsowiki
chrwiki
bgwiki
mgwiki
sahwiki
stwiki
fowiki
bclwiki
satwiki
gnwiki
arcwiki
bewiki
plwiki
myvwiki
zeawiki
nnwiki
sowiki
acewiki
htwiki
lvwiki
cebwiki
pflwiki
avwiki
frwiki
tpiwiki
ltwiki
vewiki
cewiki
pnbwiki
nlwiki
quwiki
jvwiki
miwiki
sgwiki
cowiki
adywiki
gagwiki
scnwiki
crhwiki
kvwiki
ruwiki
fjwiki
etwiki
azbwiki
cvwiki
hawwiki
nqowiki
frpwiki
bhwiki
lnwiki
smwiki
mznwiki
inhwiki
lbewiki
tgwiki
siwiki
tewiki
kshwiki
bmwiki
ruewiki
chywiki
gawiki
vowiki
bxrwiki
mtwiki
sawiki
xalwiki
minwiki
brwiki
newwiki
hiwiki
csbwiki
iawiki
glkwiki
zh_min_nanwiki
idwiki
abwiki
yowiki
omwiki
napwiki
liwiki
nywiki
sdwiki
ttwiki
vlswiki
srnwiki
thwiki
dawiki
cywiki
tswiki
tywiki
svwiki
gorwiki
gvwiki
hsbwiki
sswiki
szlwiki
tnwiki
mwlwiki
rowiki
nowiki
fawiki
itwiki
sewiki
elwiki
newiki
bjnwiki
euwiki
ffwiki
papwiki
slwiki
extwiki
frrwiki
alswiki
hakwiki
pmswiki
mswiki
towiki
kbdwiki
kywiki
arwiki
hifwiki
mkwiki
hrwiki
banwiki
iswiki
ikwiki
nds_nlwiki
tyvwiki
lmowiki
ocwiki
pcdwiki
viwiki
hewiki
ckbwiki
klwiki
emlwiki
kwwiki
pawiki
map_bmswiki
rnwiki
glwiki
cbk_zamwiki
mdfwiki
stqwiki
fiwiki
ugwiki
lgwiki
suwiki
orwiki
amwiki
twwiki
pswiki
uzwiki
vepwiki
angwiki
ukwiki
novwiki
iewiki
kgwiki
igwiki
tumwiki
hywiki
scwiki
shwiki
biwiki
rwwiki
lfnwiki
gomwiki
zawiki
kmwiki
tetwiki
atjwiki

  • The wikis that were released in v1 but they were under the release threshold in v1 and they are above the release threshold in v2.

amwiki
arcwiki
bxrwiki
crhwiki
cuwiki
fiwiki
hywiki
igwiki
inhwiki
jbowiki
klwiki
kmwiki
mlwiki
newiki
nqowiki
orwiki
pawiki
quwiki
sawiki
siwiki
sowiki
tawiki
tewiki
thwiki

Event Timeline

I've collected current performance rates and counts of the candidate wikis:

image.png (1×947 px, 152 KB)

image.png (1×930 px, 152 KB)

image.png (1×984 px, 154 KB)

  • en for comparison:

image.png (1×960 px, 153 KB)

I think we have a couple of options in order to measure impact:

  • t-test on daily acceptance rates as we have a few daily acceptance rates.
  • z-test on number of accepted and rejected suggestions as we have more data.

I think we can run both after deploying the new models. We will need to wait for few days so that daily jobs can collect the suggestions from the new models.
In addition we need to wait some more time until we have enough data to measure the impact. It depends on how much improvement we expect and how accurately we want to measure the impact.
For both t-test and z-test, I think it will be enough to wait for ~ a week.

notebook

Reporting 14/11/2025

Progress update on the hypothesis for the week, including if something has shipped:

  • All newly onboarded wikis such as jawiki, zhwiki etc. are deployed to prod and ready for enabling onboarding tasks by the Growth Team.
  • 3 wikis currently using add-a-link are updated. I'm waiting to get some results. I'll continue updating more wikis based on the results. I'll get back to it after the wikipedia benchmarks work.

Any updates on metrics related to this hypothesis (including baseline, target, or actuals, if applicable):

  • N/A

Any emerging blockers or risks:

  • N/A

Any unresolved dependencies:

  • N/A

New lessons from the hypothesis:

  • N/A

Changes to the hypothesis scope or timeline:

  • N/A

We got results for itwiki:

image.png (1×903 px, 170 KB)

I've checked two weeks before the release and after the release with 3 days of buffer.
before: "2025-10-22" - "2025-11-06"
after: "2025-11-09" - "2025-11-24"

The average acceptance rate has increased from 0.65 to 0.76.

Running a t-test on the acceptance rates, we get T-statistic = 2.712, p-value = 0.012. This means the improvement is almost statistically significant. We would prefer T-statistic >= 3.0 to be confident.

Running a z-test on the acceptance/reject counts, we get p_value < 0.05 which show the difference is not by chance.

The results above are only based on acceptance/reject counts. We should consider revert actions for a complete analysis.

We don't have any results for ltwiki and vowiki, as we found out that they are not used.

The initial plan was to enable enwiki. However, my suggestion is to deploy wikis that were released in v1 although they were under the release threshold.
So that we can get more signals before updating enwiki.

I'll create the list of wikis that were released in v1 but they were under the release threshold.

notebook

Started updating following wikis:

The wikis that were released in v1 but they were under the release threshold in v1 and they are above the release threshold in v2.
amwiki
arcwiki
bxrwiki
crhwiki
cuwiki
fiwiki
hywiki
igwiki
inhwiki
jbowiki
klwiki
kmwiki
mlwiki
newiki
nqowiki
orwiki
pawiki
quwiki
sawiki
siwiki
sowiki
tawiki
tewiki
thwiki

I've created a list of currently in use models.
These models below got at least one suggestion accept or suggestion reject since 2025-06-01.
The wikis are sorted by accept count. Therefore, the wikis above are used less.
I'll split the remaining deployments into 3.

  • Deployment 1: Deploy wikis between 1-50. (28/11/2025)
  • Deployment 2: Deploy wikis between 51-80. (01/12/2025)
  • Deployment 3: Deploy wikis between 81-113. (09/01/2026)
  • Deployment 4: Deploy enwiki. (12/01/2026)

Please feel free to suggest another order.

wiki_id
1 haw
2 pap
3 an
4 tl
5 als
6 nn
7 jam
8 sg
9 bh
10 inh
11 lez
12 ban
13 ilo
14 ckb
15 new
16 tyv
17 tg
18 sco
19 szl
20 bjn
21 ay
22 sc
23 ku
24 ne
25 or
26 la
27 mai
28 ka
29 mk
30 su
31 azb
32 sh
33 so
34 lg
35 lb
36 ky
37 si
38 min
39 af
40 arz
41 sq
42 ga
43 am
44 km
45 lv
46 eu
47 io
48 mg
49 qu
50 pnb
51 rue
52 ml
53 gl
54 zu
55 bi
56 mr
57 fo
58 te
59 ary
60 ie
61 kaa
62 nqo
63 et
64 ast
65 pa
66 ia
67 is
68 ht
69 ta
70 sat
71 ms
72 kk
73 hy
74 hr
75 gu
76 el
77 eo
78 sw
79 bs
80 be
81 da
82 sl
83 hu
84 rw
85 ig
86 th
87 bg
88 hi
89 yo
90 ca
91 no
92 vi
93 ff
94 fi
95 sk
96 sv
97 ro
98 de
99 ps
100 nl
101 cs
102 nds
103 bn
104 simple
105 uz
106 sr
107 fa
108 he
109 pl
110 it
111 fr
112 pt
113 id
114 en

  • itwiki

Looking into 17days periods:

before: "2025-10-22" "2025-11-06" -> Acceptance rate: 0.66
after: "2025-11-09" "2025-11-24"-> Acceptance rate: 0.77

Now we get t-test is statistically significant.
T-statistic = 3.303, p-value = 0.002

z-test also returns positive results.

So we can conclude the acceptance rates are increased.
We should note that:

  • We compare two different time periods. Therefore, this is not a traditional ab test. So there can be more variables than the model
  • We don't have versions of the predictions. So we may have some predictions from the old model in the "after" results. We have a 3 days buffer. On the other hand, this could only make the distributions more similar.
  • We don't consider reverted edits. Using reverted edits as negatives would lead to more confident results.

image.png (557×867 px, 45 KB)

@OKarakaya-WMF if you have time on Monday morning your time could you please write an update for me to post in Asana? No worries at all if you don't have time!

Reporting 05/11/2025

Progress update on the hypothesis for the week, including if something has shipped:

  • As we have already deployed 36 models for the wikis that are not used in v1 onboarding tasks, we continue updating models for wikis that are currently-in-use in onboarding tasks.
  • 80 out of 114 models that are currently in use are updated.
  • We will continue updating the currently in use models as we get implicit feedback.

Any updates on metrics related to this hypothesis (including baseline, target, or actuals, if applicable):

  • We have updated itwiki and we get 16% improvement on acceptance rates with the notes in the comment.

Any emerging blockers or risks:

  • N/A

Any unresolved dependencies:

  • N/A

New lessons from the hypothesis:

  • Excluding continents and. countries:

Feedback from Benoît Evellin: One user at Chinese Wikipedia shared that continents and countries are linked while they shouldn't. If I remember correctly, this is something we encountered with the previous model.
Answer:
We excluded them only for enwiki.
But we can exclude them for more languages

Related comment:
https://zh.wikipedia.org/wiki/Wikipedia:%E4%BA%92%E5%8A%A9%E5%AE%A2%E6%A0%88/%E5%85%B6%E4%BB%96#c-Kurgenera-20251204082800-Add_a_link_is_now_available_at_your_wiki

We can try different ways of creating the list of candidates:

  • case sensitive.
  • favor larger ngram when ngrams intersect.
  • use a NER model for candidate generation.

Changes to the hypothesis scope or timeline:

  • N/A

Reporting 12/11/2025

Progress update on the hypothesis for the week, including if something has shipped:

  • No updates this week. We will take a look into the acceptance rates of the new/updated models and decide how we continue with the remaining models.

Any updates on metrics related to this hypothesis (including baseline, target, or actuals, if applicable):

  • N/A

Any emerging blockers or risks:

  • N/A

Any unresolved dependencies:

  • N/A

New lessons from the hypothesis:

  • N/A

Changes to the hypothesis scope or timeline:

  • N/A

FYI, we've completed this task: T412040: Add a Link: repopulate "Add a Link" suggestions for itwiki.

Which means we could do a more accurate V1 vs. V2 comparison for itwiki now. (Related to:
your comment on T408790#11415325). Essentially due to the way the Growth feature works, your "after" data really included suggestions from both the V1 model and the V2 model. Moving forward all suggestions on itwiki are from the V2 model.

Reporting 09/01/2026
Progress update on the hypothesis for the week, including if something has shipped:

We are updating a new batch of wikis.
Please see the list of wikis in the comment below;
https://phabricator.wikimedia.org/T408790#11415304

Any updates on metrics related to this hypothesis (including baseline, target, or actuals, if applicable):
N/A
Any emerging blockers or risks:
N/A
Any unresolved dependencies:
N/A
New lessons from the hypothesis:
N/A
Changes to the hypothesis scope or timeline:
We can close this goal after updating enwiki.
We plan to close it in next week.