Background Information
As part of the next steps for Parsoid Read Views under 5.1 KR we should test Parsoid in wikis in Incubator and turn Parsoid as default if no issues are found
As part of the next steps for Parsoid Read Views under 5.1 KR we should test Parsoid in wikis in Incubator and turn Parsoid as default if no issues are found
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T378477 [EPIC] Roll-out Parsoid to Incubator Wikis and newly created wikis | |||
Resolved | ssastry | T378365 VisualDiff testing for Incubator wikis |
Looks like incubator wiki has a large number of wikis and wikis that transition to being their own wiki don't get their pages removed from Incubator. So, we need to factor all of this while creating a suitable random page sample from Incubator. Ideally, we would have a test sample between 10K - 30K pages across all wikis. So, for this to be a meaningful sample, we may have to narrow our focus to a smaller subset of wikis. We could focus our attention on the most active wikis. We can also use https://incubator.wikimedia.org/wiki/Incubator:Site_creation_log to guide our sampling decisions by excluding them from our sampling since Parsoid will not be enabled by default on those wikis.
Actually, I was not entirely right about deletion -- looks like pages for graduated wikis are removed from incubator eventually. https://incubator.wikimedia.org/wiki/Incubator:Site_creation_log shows that Incubator pages do get deleted (24th April 2024 shows the last deletion for a wiki --although there are wikis from earlier that don't have a deletion log).
It turns out that wikisource incubator wikis are not hosted on incubator.wikimedia.org which is lucky for us because Parsoid doesn't fully support wikisource yet and so we don't have look into wikisource wikis for now.
There are 15 active wikis on Incubator. If we wanted to, we could potentially also include an additional 5 wikipedias that recently graduated from incubator to their own wikis. See below for the prefix list extracted from https://incubator.wikimedia.org/wiki/Incubator:Featured_wikis and https://incubator.wikimedia.org/wiki/Incubator:Site_creation_log#2024. So I'll download the complete title dump from dumps.wikimedia.org and extract titles for these wikis by looking for these title prefixes. This will create a baseline title set from just these wikis which I can then random sample to create our visual diff test set.
Most active incubator wikipedias
Wp/isv
Wp/knc
Wp/lua
Wp/mag
Wp/mrt
Wp/nup
Wp/rki
Wp/syl
Wp/tig
Most active incubator wikivoyages
Wy/id
Wy/hbs
Other active projects
Wt/ary
Wb/bcl
Wq/ha
Wp/grc
Planned to graduate soon
Wt/tcy
Recently graduated incubator wikipedias (Oct 2024)
Wp/ann
Wp/iba
Wp/tdd
Wp/rsk
Wp/nr
This gives us a total of 40821 titles to sample from. This is how the titles are spread across those 21 wikis. We could conceivably run diffs on all 40K titles since most pages are likely small.
7696 Wp/mag 5080 Wp/grc 3494 Wp/syl 3181 Wt/tcy 2941 Wy/id 2687 Wp/rki 2314 Wp/iba 2301 Wq/ha 2117 Wt/ary 2095 Wp/isv 1413 Wp/knc 1171 Wp/lua 1059 Wp/tdd 695 Wp/nup 594 Wp/tig 529 Wp/rsk 479 Wp/ann 333 Wy/hbs 268 Wp/nr 230 Wp/mrt 144 Wb/bcl
The visual diff run is done and we have results -- we can rerun results again later once we've fixed any relevant issues.