Page MenuHomePhabricator

Add a Link: Remove Country and Continent names in suggestions
Closed, ResolvedPublic

Description

User story & summary:

As a Wikipedian, I want the "Add a link" task to better align with the Manual of Style (MOS) guidelines and the specific norms of my wiki.

Primary concern: The current link suggestion algorithm frequently recommends linking country names (Q6256) and continent names (Q5107) , which violates the Manual of Style on many wikis.

We should prevent these suggestions to reduce patroller burden and improve compliance with community standards.

Background & research:

Similar to: T386867: Add a Link: add "do not link" rule for country names (Q6256) on English Wikipedia, except ideally we can make this change across all wikis as this seems to be a reoccurring frustration across many wikis.

Related zhwiki discussion.

Acceptance Criteria:

Ensure link suggestion model or data pipeline is updated to avoid linking Country and Continent names.
Repopulate zhwiki link suggestions.

Event Timeline

Change #1226026 had a related patch set uploaded (by Akaza24; author: Akaza24):

[research/mwaddlink@main] Filter country and continent names from link suggestions across all wikis

https://gerrit.wikimedia.org/r/1226026

Hello,

Do we need this change for all wikis or only for zhwiki?
In other words, do we want to change it for all wikis but more urgently for zhwiki?

@OKarakaya-WMF
Sorry, that is somewhat confusing based on the description. At this point we've had several wikis reach out and request this change or note that they think it would be a nice improvement.

My understanding is that based on how this data pipeline works it would be very tricky to allow communities to configure this setting, and so the path of least resistance (and avoids us having to manually get involved each time this request comes up) is to remove Country and Continent names in suggestions across all wikis.

That being said, for most wikis I don't think this is enough of an issue that we need to purge and repopulate the link suggestion queue. However ideally we can do that for zhwiki where this task is very new and I would like to ensure it works well for them. But I'm not sure, this last step (that I'm probably not describing in the most accurate technical way) might be for the Growth team? Let me know if that's the case and I can split this requirement into a subtask.

Thank you so much for your help!

hi @KStoller-WMF ,
crystal clear, thank you!

I think I can update the zhwiki soon and Growth team can clear the recommendations after the deployment.
I'll let you know.

After that we can proceed with other wikis.

adding @Sucheta-Salgaonkar-WMF to the conversation.

Thank you!

Actually, I had an idea to stop recommending popular links.
So if there are already too many links to a page e.g. a page in 99 percentile . We can stop recommending it.

I don't actually which page types (country, biography etc) are in 99 percentile but I think I can easily check and we can decide based on which pages are frequently linked in overall.

I'll proceed with the original plan namely removing countries and continents from the recommendations.
Meanwhile I can check what types of pages has many links.

I've trained a model without countries and continents for zhwiki.
We get similar f1 scores. I'll proceed with deploying it.

v2 (current prod)

0	threshold	N	micro_precision	        micro_recall	wiki_db
1	0.5	     10000	0.7828906733	        0.432107183	zhwiki

v3 (countries & continents removed.)

0	threshold	N	micro_precision	        micro_recall	        wiki_db
1	0.5	      10000	0.811832087727876	0.387239713476595	zhwiki

zhwiki v2 model checksum:

c4796c3c193d983980a445bb2a76f65def9f2459599fa6df055984bd851d3ca3 zhwiki.linkmodel.json

zhwiki new model checksum.
c4950228598e64c08ae817df316f2f3127d93df27dfbcddfadd5f2550586bdff zhwiki.linkmodel.json

new model is deployed to prod.

Looking into the models that we need to update based on:
Frontend enabled models: https://gerrit.wikimedia.org/r/plugins/gitiles/operations/mediawiki-config/+/refs/heads/master/wmf-config/ext-GrowthExperiments.php
Wikis supported by v2: https://analytics.wikimedia.org/published/wmf-ml-models/addalink/v2/

  • We exclude following wikis that are frontend_enabled but not supported by v2 or wiki is not used anymore:

{'tkwiki', 'oswiki', 'azwiki', 'testwiki', 'mnwiki', 'lowiki', 'koiwiki', 'pihwiki', 'kowiki', 'hawiki', 'knwiki', 'xmfwiki', 'tcywiki'}

  • We exclude following wikis that are supported in v2 but is not frontend_enabled:

{'nrmwiki', 'ltwiki', 'nvwiki', 'bugwiki', 'ikwiki', 'klwiki', 'chwiki', 'bpywiki', 'kgwiki', 'jbowiki', 'novwiki'}

  • We will exclude the wikis that gets below the release threshold. We will find them out after the training.

Therefore, we will update the following wikis:
0-abwiki
1-acewiki
2-adywiki
3-afwiki
4-alswiki
5-altwiki
6-amwiki
7-angwiki
8-anwiki
9-arcwiki
10-arwiki
11-arywiki
12-arzwiki
13-astwiki
14-atjwiki
15-avkwiki
16-avwiki
17-awawiki
18-aywiki
19-azbwiki
20-banwiki
21-barwiki
22-bat_smgwiki
23-bawiki
24-bclwiki
25-be_x_oldwiki
26-bewiki
27-bgwiki
28-bhwiki
29-biwiki
30-bjnwiki
31-blkwiki
32-bmwiki
33-bnwiki
34-bowiki
35-brwiki
36-bswiki
37-bxrwiki
38-cawiki
39-cbk_zamwiki
40-cdowiki
41-cebwiki
42-cewiki
43-chrwiki
44-chywiki
45-ckbwiki
46-cowiki
47-crhwiki
48-csbwiki
49-cswiki
50-cuwiki
51-cvwiki
52-cywiki
53-dagwiki
54-dawiki
55-dewiki
56-dinwiki
57-diqwiki
58-dsbwiki
59-dtywiki
60-dvwiki
61-eewiki
62-elwiki
63-emlwiki
64-enwiki :done
65-eowiki
66-eswiki
67-etwiki
68-euwiki
69-extwiki
70-fawiki
71-ffwiki
72-fiu_vrowiki
73-fiwiki
74-fjwiki
75-fowiki
76-frpwiki
77-frrwiki
78-frwiki
79-furwiki
80-fywiki
81-gagwiki
82-ganwiki
83-gawiki
84-gcrwiki
85-gdwiki
86-glkwiki
87-glwiki
88-gnwiki
89-gomwiki
90-gorwiki
91-gotwiki
92-gucwiki
93-gurwiki
94-guwiki
95-guwwiki
96-gvwiki
97-hakwiki
98-hawwiki
99-hewiki
100-hifwiki
101-hiwiki
102-hrwiki
103-hsbwiki
104-htwiki
105-huwiki
106-hywiki
107-hywwiki
108-iawiki
109-idwiki
110-iewiki
111-igwiki
112-ilowiki
113-inhwiki
114-iowiki
115-iswiki
116-itwiki
117-iuwiki
118-jamwiki
119-jawiki
120-jvwiki
121-kaawiki
122-kabwiki
123-kawiki
124-kbdwiki
125-kbpwiki
126-kcgwiki
127-kiwiki
128-kkwiki
129-kmwiki
130-krcwiki
131-kshwiki
132-kswiki
133-kuwiki
134-kvwiki
135-kwwiki
136-kywiki
137-ladwiki
138-lawiki
139-lbewiki
140-lbwiki
141-lezwiki
142-lfnwiki
143-lgwiki
144-lijwiki
145-liwiki
146-lldwiki
147-lmowiki
148-lnwiki
149-ltgwiki
150-lvwiki
151-madwiki
152-maiwiki
153-map_bmswiki
154-mdfwiki
155-mgwiki
156-mhrwiki
157-minwiki
158-miwiki
159-mkwiki
160-mlwiki
161-mniwiki
162-mnwwiki
163-mrjwiki
164-mrwiki
165-mswiki
166-mtwiki
167-mwlwiki
168-myvwiki
169-mywiki
170-mznwiki
171-nahwiki
172-napwiki
173-nds_nlwiki
174-ndswiki
175-newiki
176-newwiki
177-niawiki
178-nlwiki
179-nnwiki
180-nowiki
181-nqowiki
182-nsowiki
183-nywiki
184-ocwiki
185-olowiki
186-omwiki
187-orwiki
188-pagwiki
189-pamwiki
190-papwiki
191-pawiki
192-pcdwiki
193-pcmwiki
194-pdcwiki
195-pflwiki
196-plwiki
197-pmswiki
198-pnbwiki
199-pntwiki
200-pswiki
201-ptwiki
202-pwnwiki
203-quwiki
204-rmwiki
205-rmywiki
206-rnwiki
207-roa_rupwiki
208-roa_tarawiki
209-rowiki
210-ruewiki
211-ruwiki
212-rwwiki
213-sahwiki
214-satwiki
215-sawiki
216-scnwiki
217-scowiki
218-scwiki
219-sdwiki
220-sewiki
221-sgwiki
222-shiwiki
223-shnwiki
224-shwiki
225-simplewiki
226-siwiki
227-skrwiki
228-skwiki
229-slwiki
230-smnwiki
231-smwiki
232-sowiki
233-sqwiki
234-srnwiki
235-srwiki
236-sswiki
237-stqwiki
238-stwiki
239-suwiki
240-svwiki
241-swwiki
242-szlwiki
243-szywiki
244-tawiki
245-taywiki
246-tetwiki
247-tewiki
248-tgwiki
249-thwiki
250-tiwiki
251-tlwiki
252-tnwiki
253-towiki
254-tpiwiki
255-trwiki
256-tswiki
257-ttwiki
258-tumwiki
259-twwiki
260-tyvwiki
261-tywiki
262-udmwiki
263-ugwiki
264-ukwiki
265-urwiki
266-uzwiki
267-vecwiki
268-vepwiki
269-vewiki
270-viwiki
271-vlswiki
272-vowiki
273-warwiki
274-wawiki
275-wowiki
276-wuuwiki
277-xalwiki
278-xhwiki
279-yiwiki
280-yowiki
281-zawiki
282-zeawiki
283-zh_min_nanwiki
284-zhwiki :done
285-zuwiki

dinwiki has failed it does not have enough data for training and it's one of smallest wikis.

369 articles.
12 active users.

https://en.wikipedia.org/wiki/List_of_Wikipedias

I'll remove it from the list.

26/01/22 19:01:53 INFO DAGScheduler: Job 1 finished: toPandas at /var/lib/hadoop/data/k/yarn/local/usercache/analytics-ml/appcache/application_1764064841637_1311131/container_e145_1764064841637_1311131_01_000001/venv/lib/python3.10/site-packages/add_a_link/generate_addlink_model.py:128, took 1.044839 s
dinwiki 3
Total training data size: 3
Traceback (most recent call last):
  File "/var/lib/hadoop/data/k/yarn/local/usercache/analytics-ml/appcache/application_1764064841637_1311131/container_e145_1764064841637_1311131_01_000001/venv/bin/generate_addlink_model.py", line 7, in <module>
    sys.exit(main())

training and staging deployments are completed.
Following wikis are below the release threshold. I'll exclude them from production deployment and start the deployment.

	wiki_db	above_threshold
0	adywiki	0
1	bawiki	0
2	biwiki	0
3	bmwiki	0
4	chrwiki	0
5	chywiki	0
6	eewiki	0
7	ffwiki	0
8	gagwiki	0
9	glkwiki	0
10	guwiki	0
11	hywwiki	0
12	kiwiki	0
13	lgwiki	0
14	madwiki	0
15	mhrwiki	0
16	nywiki	0
17	omwiki	0
18	pagwiki	0
19	pswiki	0
20	pwnwiki	0
21	quwiki	0
22	rnwiki	0
23	roa_tarawiki	0
24	shiwiki	0
25	smwiki	0
26	sswiki	0
27	tswiki	0
28	tumwiki	0
29	twwiki	0
30	vepwiki	0
31	vowiki	0
32	xhwiki	0

final list of wikis to update with the release date today (26/01/2026)

["abwiki", "acewiki", "afwiki", "alswiki", "altwiki", "amwiki", "angwiki", "anwiki", "arcwiki", "arwiki", "arywiki", "arzwiki", "astwiki", "atjwiki", "avkwiki", "avwiki", "awawiki", "aywiki", "azbwiki", "banwiki", "barwiki", "bat_smgwiki", "bclwiki", "be_x_oldwiki", "bewiki", "bgwiki", "bhwiki", "bjnwiki", "blkwiki", "bnwiki", "bowiki", "brwiki", "bswiki", "bxrwiki", "cawiki", "cbk_zamwiki", "cdowiki", "cebwiki", "cewiki", "ckbwiki", "cowiki", "crhwiki", "csbwiki", "cswiki", "cuwiki", "cvwiki", "cywiki", "dagwiki", "dawiki", "dewiki", "diqwiki", "dsbwiki", "dtywiki", "dvwiki", "elwiki", "emlwiki", "eowiki", "eswiki", "etwiki", "euwiki", "extwiki", "fawiki", "fiu_vrowiki", "fiwiki", "fjwiki", "fowiki", "frpwiki", "frrwiki", "frwiki", "furwiki", "fywiki", "ganwiki", "gawiki", "gcrwiki", "gdwiki", "glwiki", "gnwiki", "gomwiki", "gorwiki", "gotwiki", "gucwiki", "gurwiki", "guwwiki", "gvwiki", "hakwiki", "hawwiki", "hewiki", "hifwiki", "hiwiki", "hrwiki", "hsbwiki", "htwiki", "huwiki", "hywiki", "iawiki", "idwiki", "iewiki", "igwiki", "ilowiki", "inhwiki", "iowiki", "iswiki", "itwiki", "iuwiki", "jamwiki", "jawiki", "jvwiki", "kaawiki", "kabwiki", "kawiki", "kbdwiki", "kbpwiki", "kcgwiki", "kkwiki", "kmwiki", "krcwiki", "kshwiki", "kswiki", "kuwiki", "kvwiki", "kwwiki", "kywiki", "ladwiki", "lawiki", "lbewiki", "lbwiki", "lezwiki", "lfnwiki", "lijwiki", "liwiki", "lldwiki", "lmowiki", "lnwiki", "ltgwiki", "lvwiki", "maiwiki", "map_bmswiki", "mdfwiki", "mgwiki", "minwiki", "miwiki", "mkwiki", "mlwiki", "mniwiki", "mnwwiki", "mrjwiki", "mrwiki", "mswiki", "mtwiki", "mwlwiki", "myvwiki", "mywiki", "mznwiki", "nahwiki", "napwiki", "nds_nlwiki", "ndswiki", "newiki", "newwiki", "niawiki", "nlwiki", "nnwiki", "nowiki", "nqowiki", "nsowiki", "ocwiki", "olowiki", "orwiki", "pamwiki", "papwiki", "pawiki", "pcdwiki", "pcmwiki", "pdcwiki", "pflwiki", "plwiki", "pmswiki", "pnbwiki", "pntwiki", "ptwiki", "rmwiki", "rmywiki", "roa_rupwiki", "rowiki", "ruewiki", "ruwiki", "rwwiki", "sahwiki", "satwiki", "sawiki", "scnwiki", "scowiki", "scwiki", "sdwiki", "sewiki", "sgwiki", "shnwiki", "shwiki", "simplewiki", "siwiki", "skrwiki", "skwiki", "slwiki", "smnwiki", "sowiki", "sqwiki", "srnwiki", "srwiki", "stqwiki", "stwiki", "suwiki", "svwiki", "swwiki", "szlwiki", "szywiki", "tawiki", "taywiki", "tetwiki", "tewiki", "tgwiki", "thwiki", "tiwiki", "tlwiki", "tnwiki", "towiki", "tpiwiki", "trwiki", "ttwiki", "tyvwiki", "tywiki", "udmwiki", "ugwiki", "ukwiki", "urwiki", "uzwiki", "vecwiki", "vewiki", "viwiki", "vlswiki", "warwiki", "wawiki", "wowiki", "wuuwiki", "xalwiki", "yiwiki", "yowiki", "zawiki", "zeawiki", "zh_min_nanwiki", "zuwiki"]

Following wikis are deployed to prod and the others are in the queue.
I see large wikis (e.g. dewiki) take long (~4 hours) and small wikis (e.g. hiwiki) get deployed quickly (~10 minutes)
In overall, I think it still makes sense to deploy them sequentially not to add too much load to maria db.

ozge@deploy2002:~$ kubectl logs linkrecommendation-internal-load-datasets-29490420-2w4rp | grep "Finished"
{"written_at": "2026-01-26T11:46:33.956Z", "written_ts": 1769427993956543000, "msg": "Finished importing for cswiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-26T12:53:47.576Z", "written_ts": 1769432027576424000, "msg": "Finished importing for viwiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-26T13:09:40.282Z", "written_ts": 1769432980282786000, "msg": "Finished importing for bnwiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-26T16:37:33.276Z", "written_ts": 1769445453276614000, "msg": "Finished importing for frwiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-26T16:57:22.956Z", "written_ts": 1769446642956707000, "msg": "Finished importing for elwiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-26T20:56:05.057Z", "written_ts": 1769460965057116000, "msg": "Finished importing for dewiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-26T22:20:47.001Z", "written_ts": 1769466047001129000, "msg": "Finished importing for ptwiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-26T22:59:03.831Z", "written_ts": 1769468343831399000, "msg": "Finished importing for huwiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-27T00:22:40.305Z", "written_ts": 1769473360305019000, "msg": "Finished importing for fawiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-27T01:04:34.180Z", "written_ts": 1769475874180359000, "msg": "Finished importing for rowiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-27T03:01:36.975Z", "written_ts": 1769482896975608000, "msg": "Finished importing for plwiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-27T05:23:12.892Z", "written_ts": 1769491392892474000, "msg": "Finished importing for nlwiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-27T06:20:38.364Z", "written_ts": 1769494838364937000, "msg": "Finished importing for cawiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-27T06:50:28.072Z", "written_ts": 1769496628072412000, "msg": "Finished importing for hewiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-27T07:00:26.660Z", "written_ts": 1769497226660521000, "msg": "Finished importing for hiwiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-27T07:44:41.185Z", "written_ts": 1769499881185127000, "msg": "Finished importing for nowiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}
{"written_at": "2026-01-27T10:27:19.240Z", "written_ts": 1769509639240561000, "msg": "Finished importing for svwiki!", "type": "log", "logger": "__main__", "thread": "MainThread", "level": "INFO", "module": "load-datasets", "line_no": 445}

Deployment completed. I've checked some of the wikis and they work fine.

Change #1226026 abandoned by Akaza24:

[research/mwaddlink@main] Filter country and continent names from link suggestions across all wikis

Reason:

pipelines are moved to another project

https://gerrit.wikimedia.org/r/1226026