Page MenuHomePhabricator

Fix omission of wikis in canonical_data.wikis
Closed, ResolvedPublic

Event Timeline

mpopov triaged this task as High priority.Apr 26 2022, 5:13 PM
mpopov moved this task from Triage to Current Quarter on the Product-Analytics board.

Per discussion in the PA team's planning meeting, I'm assigning this to myself and moving it to our Kanban board. I'll work on this later this week.

nettrom_WMF changed the task status from Open to In Progress.May 2 2022, 8:53 PM
nettrom_WMF moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.

Change 788429 had a related patch set uploaded (by Nettrom; author: Nettrom):

[mediawiki/extensions/WikimediaMessages@master] Add Doteli Wikipedia and Punjabi Wikisource

https://gerrit.wikimedia.org/r/788429

nettrom_WMF added a subscriber: jwang.

I reviewed whether we could use the SiteMatrix API and left this comment suggesting we don't as the data doesn't really support our use case. I'll investigate whether there's a better source for canonical data, and I also created this pull request that identifies the wikis we lack English names for and adds those manually (similarly to how the notebook handles missing language names).

Moving this to Needs Review so @jwang can review.

I've also updated the patch for WikimediaMessages in Gerrit so it also modifies qqq.json, which should make it pass build tests.

@tstarling @Jdforrester-WMF Product Analytics has been using Extension:WikimediaMessages's i18n/wikimediaprojectnames/en.json for a canonical dataset we use in reporting (to get the English names of various wikis). But since we discovered some current Wikis were missing, we're wondering whether this extension is maintained & reliable.

I saw your names listed as authors; do you know who or which teams are currently responsible for maintaining Extension:WikimediaMessages and WikimediaMessages?

@tstarling @Jdforrester-WMF Product Analytics has been using Extension:WikimediaMessages's i18n/wikimediaprojectnames/en.json for a canonical dataset we use in reporting (to get the English names of various wikis). But since we discovered some current Wikis were missing, we're wondering whether this extension is maintained & reliable.

It's not really maintained by any one team. Historically it's been me and others helping on an as-needed basis. It's the responsibility of people creating new wikis to add the new entries to that file, but it looks like although that documented it hasn't always been done.

I saw your names listed as authors; do you know who or which teams are currently responsible for maintaining Extension:WikimediaMessages and WikimediaMessages?

No team is responsible for that extension, unfortunately. See the table at https://www.mediawiki.org/wiki/Developers/Maintainers#MediaWiki_extensions_deployed_at_Wikimedia_Foundation for details of which teams admit responsibility for what. This particular data set was originally created for cross-wiki notification messages, as part of the Collaboration Team's work (now nominally owned by Growth-Team but I think they've have > 100% turnover of staff since they last worked on this).

I've merged the patch in question. Happy to help with further changes as needed!

Change 788429 merged by jenkins-bot:

[mediawiki/extensions/WikimediaMessages@master] Add Doteli Wikipedia and Punjabi Wikisource

https://gerrit.wikimedia.org/r/788429

The updates wikis.csv has now been loaded into canonical_data.wikis and I've confirmed that all the new/updated wikis are correctly present in the dataset.