Page MenuHomePhabricator

Get download / clone stats for MediaWiki and other repos
Open, MediumPublic

Description

It would be nice to know how many people download MediaWiki via tarball or git, and how the numbers changed over time.

This does not really say much about MediaWiki usage (many downloads would be automated one-time things like continuous integration builds, and many MediaWiki installs are created via some third-party provider (e.g. TurnKey) and do not involve getting MediaWiki from our servers, but having the numbers would be interesting nevertheless.

Some things to look at:

  • releases.wikimedia.org downloads (used since 2014 as canonical download location for releases) - 135K / month (as of February 2018)
  • sourceforge.net and download.wikimedia.org (used to be the canonical download locations for releases in the past) - SourceForge peak download rate was 60-70K per month in 2007 (stats)
  • gerrit clones (canonical location for master / developer setups)
  • ExtensionDistributor downloads (canonical location for extensions) - ~15K core and ~15K extensions and skins a month (dashboard)
  • Github tarball downloads (canonical location for master branch of extensions)
  • Github clones (not really exposed, except on Github)
  • phabricator clones
  • gitiles clones/tarballs (new thing we use with gerrit)
  • old thing we used with gerrit, whatever that was

(or just assume that most of those are barely used, which is probably the case)

See also:

Event Timeline

old thing we used with gerrit, whatever that was

Do you mean Diffusion, Gitblit, or Gitweb? We've used all 3 ;-)

Tgr updated the task description. (Show Details)

Kunal made a nice dashboard for ExtensionDistributor downloads: https://grafana.wikimedia.org/dashboard/db/extension-distributor-downloads?orgId=1&from=20161101&to=now&var-release=All&var-groupby=1M although it seems like skin/extension data was broken last November. Or maybe fixed? Anyway, assuming the old data to be correct, there are about 13-15K core downloads a month, a similar number of extension downloads and 1200-1300 skin donwloads a typical month.

Looking at the history of the Download page and download link templates the canonical release download location was:

  • SourceForge (prdownloads.sourceforge.net) until 2007 February
  • download.wikimedia.org until 2014 February (also noc.wikimedia.org for a short while)
  • releases.wikimedia.org since then

For extensions, the canonical download locations are currently ExtensionDistributor and Github tarballs (for master).

Amazingly, there are still some downloads from SourceForge, even though the latest version there is 1.9.2. (Are those bots?) Mostly it has been used between 2003-2010, though:

sourceforge mw downloads.png (430×1 px, 36 KB)
(source)

The big drop in 2007 is when it was replaced on the download page.

Monthly download count at peak usage was 60-70K, total downloads during project lifetime around 1.5M.

Tgr updated the task description. (Show Details)
0: jdbc:hive2://analytics1003.eqiad.wmnet:100> SELECT count(*) FROM webrequest WHERE uri_host = 'releases.wikimedia.org' AND uri_path RLIKE '^/mediawiki/.*\.tar\.gz' AND year = 2018 AND month = 1;

...
134576
1 row selected (3447.245 seconds)

Would be nice to extract that data into some more manageable format.

Github exposes download numbers via the API, but it seems broken - even though MediaWiki clearly has releases, the API does not seem to be aware of them.

Github exposes download numbers via the API, but it seems broken - even though MediaWiki clearly has releases, the API does not seem to be aware of them.

So apparently it only recognizes releases as such if you manually add release notes for them... and it only provides download stats for releases, not tag tarballs. Filed T186986: Create Github releases for MediaWiki (and maybe extensions).

For extensions this is a problem since we use Github to provide master snapshots, and there is no way to get any download stats about those. @demon do you think it would be possible to replace github tarball links with gitiles tarball links in the extension infoboxes (the WikimediaDownload template, specifically) and get statistics on how many people actually use them?

For extensions this is a problem since we use Github to provide master snapshots, and there is no way to get any download stats about those.

Why do we need to use Github? What snapshots are we talking about? Extension distributor should be able do this for us....

@demon do you think it would be possible to replace github tarball links with gitiles tarball links in the extension infoboxes (the WikimediaDownload template, specifically) and get statistics on how many people actually use them?

I really really don't want to do this. I disabled tarball links in Gitblit (and Gitweb before that) on purpose. It's a DOS vector for people to generate archives for $random_sha1s on demand. Gitiles and most other git viewers don't cache these for future requests.

Isn't the ext-dist service on wmflabs designed for this already? That's what we plug Extension Distributor into.

Why do we need to use Github? What snapshots are we talking about? Extension distributor should be able do this for us....

I don't know if we need it, but right now the top two links in WikimediaDownload (the template used for all gerrit extension infoboxes) are a link to ExtensionDistributor and a link to the Github tarball, with the wording implying that the latter is the way to get a master tarball.

I really really don't want to do this. I disabled tarball links in Gitblit (and Gitweb before that) on purpose. It's a DOS vector for people to generate archives for $random_sha1s on demand. Gitiles and most other git viewers don't cache these for future requests.

Isn't the ext-dist service on wmflabs designed for this already? That's what we plug Extension Distributor into.

It certainly can do it (although there doesn't seem to be a way to create a direct link which selects the "master" option in the ExtensionDistributor interface). Not sure if that's deemphasized intentionally or maybe the template was just written before ExtensionDistributor learned to do that.

Why do we need to use Github? What snapshots are we talking about? Extension distributor should be able do this for us....

I don't know if we need it, but right now the top two links in WikimediaDownload (the template used for all gerrit extension infoboxes) are a link to ExtensionDistributor and a link to the Github tarball, with the wording implying that the latter is the way to get a master tarball.

I think we could get Extension Distributor speaking master, if it doesn't already (long as we cache on the sha1) :)

I really really don't want to do this. I disabled tarball links in Gitblit (and Gitweb before that) on purpose. It's a DOS vector for people to generate archives for $random_sha1s on demand. Gitiles and most other git viewers don't cache these for future requests.

Isn't the ext-dist service on wmflabs designed for this already? That's what we plug Extension Distributor into.

It certainly can do it (although there doesn't seem to be a way to create a direct link which selects the "master" option in the ExtensionDistributor interface). Not sure if that's deemphasized intentionally or maybe the template was just written before ExtensionDistributor learned to do that.

I think this is fixable :)

ExtensionDistributor already generates master tarballs...

So can we just use it to replace the Github link?