Page MenuHomePhabricator

Identify archived extensions on mediawiki.org but not on Gerrit (and viceversa)
Open, MediumPublic

Description

I've noticed that some extensions were tagged as "archived" on mediawiki.org yet their repos were not archived on Gerrit. The similar may be happening vice versa. We should compile a list of extensions tagged as archived on Gerrit and mediawiki.org and join them, then identify which ones are archived in one place and not in the other and fix that anonmality.

T190286 was an instance of this. I suspect we have much more.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 26 2018, 11:03 AM
MarcoAurelio triaged this task as Medium priority.Mar 26 2018, 11:03 AM
MarcoAurelio updated the task description. (Show Details)
MarcoAurelio updated the task description. (Show Details)

In theory my thoughts were like this:

curl -s "https://gerrit.wikimedia.org/r/projects/" > gerritprojects.json
sed -i '1d' gerritprojects.json # remove noise in first line that isn't proper JSON
more gerritprojects.json | jq -r '.[] | select(.state=="READ_ONLY").id' > gerritprojects.rawlist # get readonly repos
grep "mediawiki%2Fextensions%2F" gerritprojects.rawlist > gerritextensions.rawlist # get only extension repos
more gerritextensions.rawlist | cut -c 26- > gerritextensions.list # cut the mediawiki%2Fextensions%2F prefix
sed -i -e 's/%2F/\//g' gerritextensions.list # replace %2F by /
sort gerritextensions.list -o gerritextensions.list

curl -s "https://www.mediawiki.org/w/api.php?action=query&list=backlinks&bltitle=Template:Archived_extension&bllimit=max&blnamespace=102&format=json" > mwextensions.json
# TODO: Results are limited to 500, must continue
more mwextensions.json | jq -r '.query | .[] | .[] | .title' > mwextensions.rawlist # get names of extensions
more mwextensions.rawlist | cut -c 11- > mwextensionsandsubpages.list # cut Extension: prefix
sed '/.*\//d' mwextensionsandsubpages.list > mwextensions.list # remove translation subpages etc from list
sort mwextensions.list -o mwextensions.list

comm -1 gerritextensions.list mwextensions.list
comm -2 gerritextensions.list mwextensions.list

In practice that requires way more work as you want to compare each read-only/archived extensions list in place A against the full lists of extensions in place B and vice versa.

Yes, something like that, and mark as archived as appropriate (either in MW or Gerrit). I don't know if some automation could be set up though. Doing this manually can be a real pain.

In practice that requires way more work as you want to compare each read-only/archived extensions list in place A against the full lists of extensions in place B and vice versa.

Well, if the syntax at both pages are the same, we can easily see changes using Special:ComparePage. If they layout changes a lot then we'll have a lot of noise when comparing.

Maybe some on SourceForge/Pirate Bay too?

@Liuxinyu970226 : I'd say that's out of scope for this task but anyone [else] is free to investigate that...

Mainframe98 added a subscriber: Mainframe98.EditedAug 5 2018, 10:58 AM

I wrote a quick (and probably dirty) program to find all extensions that were archived on MediaWiki.org, but not in Gerrit.
Results:

EnhanceContactForm
Link Suggest
MwEmbedSupport
Pdf Export
SecureHTML
Semantic Expressiveness
Semantic Genealogy
Semantic Sifter
SemanticHighcharts

EnhanceContactForm has some issues, https://gerrit.wikimedia.org/g/project:mediawiki/extensions/EnhanceContactForm yields a blank page, but https://gerrit.wikimedia.org/r/projects/ reports the repo as active.
Link Suggest doesn't refer to the Gerrit repo on its extension page.
MwEmbedSupport was recently archived, so it might yet to be archived, or since 1.32 isn't out yet, kept while 1.32 development is ongoing. See T197918: Archive the MwEmbedSupport extension
Pdf Export. Done in T202220: Archive the PDF Export extension
SecureHTML is a false positive due to a name conflict.
Semantic Expressiveness is handled in T199417: Archive the Semantic Expressiveness extension.
Semantic Genealogy is handled in T199419: Archive the Semantic Genealogy extension.
Semantic Sifter is handled in T199423: Archive the Semantic Sifter extension.
SemanticHighcharts is handled in T199421: Archive the SemanticHighcharts extension.

I couldn't find any skins that were archived on MediaWiki.org, but not in Gerrit.