Looks pretty trivial to get this started in Cloud Services for now. Ideally this will help with CI reliablity, remove an external dependency, and use our resources to help the rest of the ecosystem.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Legoktm | T203529 Set up a packagist mirror for Wikimedia | |||
Resolved | bd808 | T203533 Request creation of packagist-mirror VPS project |
Event Timeline
Mentioned in SAL (#wikimedia-cloud) [2018-11-26T05:54:50Z] <legoktm> created packagist-mirror1, cloned https://github.com/Webysther/packagist-mirror and started mirror creation script in a screen (T203529)
So the mirror creation is done, I installed apache2, and symlinked /var/www/html to /srv/packagist-mirror/public/ and verified that curl localhost works. But I'm unable to reach port 80 from outside of the instance, whether via DNS proxy or from inside another cloud vps (toolforge in this case). Probably an issue with security groups, but I don't see anything obvious. On IRC, arturo offered to take a look <3.
I added a rule allowing HTTP (80/tcp) traffic to the instance from anywhere to the security group.
Before and after:
aborrero@tools-bastion-03:~$ telnet packagist-mirror1.eqiad.wmflabs 80 Trying 172.16.1.212... ^C aborrero@tools-bastion-03:~$ telnet packagist-mirror1.eqiad.wmflabs 80 Trying 172.16.1.212... Connected to packagist-mirror1.eqiad.wmflabs. Escape character is '^]'. ^C^C^C
In fact, I just created a separate security group, called HTTP/HTTPS, instead of adding rules to the default one.
This is the resulting instance configuration:
aborrero@tools-bastion-03:~$ telnet packagist-mirror1.eqiad.wmflabs 80 Trying 172.16.1.212... Connected to packagist-mirror1.eqiad.wmflabs. Escape character is '^]'. GET / <!DOCTYPE html> <html> [...] <p> This is PHP package repository Packagist.org mirror site. </p> [...]
packagist.org had some issue on Friday (T226253), I guess it is a project we might want to revive?
OK, a few hours of fiddling with Apache, and I've gotten this working!
Copying it here just in case so it doesn't get lost.
<Directory /srv/packagist-mirror/public> Require all granted RewriteEngine on # Serve correct content types, and prevent mod_deflate double gzip. RewriteRule "\.json$" "-" [T=application/json,E=no-gzip:1] <FilesMatch "\.json$"> # Serve correct encoding type. Header append Content-Encoding gzip # Force proxies to cache gzipped & # non-gzipped json files separately. Header append Vary Accept-Encoding </FilesMatch> </Directory>
OK, it works. Instructions at https://packagist-mirror.wmflabs.org/
A word of caution, if the mirror script is malicious, then it can inject/execute malicious code basically. I've finished auditing the mirror code itself, but not its dependencies. I sent a PR to get rid of one dependency so far.
km@km-pt ~> curl --compressed -I 'https://packagist-mirror.wmflabs.org/packages.json' | grep last-modified last-modified: Wed, 26 Jun 2019 16:33:02 GMT
It seems like every time the mirror script runs (every 5 minutes), it's touching all the files, updating last-modified values, and preventing basic caching from working...
So far illuminate/support and nesbot/carbon don't seem to be used (issue).
Remaining stuff:
✔️guzzlehttp/guzzle 6.3.3 MIT ✔️guzzlehttp/promises v1.3.1 MIT ✔️guzzlehttp/psr7 1.4.2 MIT league/flysystem 1.0.52 MIT league/flysystem-cached-adapter 1.0.9 MIT ✔️php-snippets/circular-array v1.0.0 MIT ✔️psr/cache 1.0.1 MIT ✔️psr/http-message 1.0.1 MIT ✔️psr/log 1.1.0 MIT ✔️sebastian/version 2.0.1 BSD-3-Clause ✔️symfony/console v3.4.27 MIT ✔️symfony/debug v4.2.8 MIT ✔️symfony/polyfill-mbstring v1.11.0 MIT vlucas/phpdotenv v2.4.0 BSD-3-Clause-Attribution
(I didn't actually review the symfony stuff, just assuming it's safe)
Something I have noticed is that the files listing the packages keep changing. Entries are appearing and disappearing as code is updated or new tags are pushed. An example I had is rackbeat/laravel-morph-where-has which tagged a new release as I was testing my theory, it has cut a new tag after a year or so of inactivity. Its definition thus moved from p-provider-2018-07.json to p-provider-latest.json forcing both files to be downloaded again.
The /packages.json is an index of all those provider files and list their checksum. Thus each time a package somewhere is updated, one of the provider file changes and packages.json has the sha256 updated.
So I guess it is working as intended?
Beside that, the zip/tarballs are reused from the local cache.
If you need help I'm here, already helping with photos and edits, but never imagined that could help with OSS inside wikimedia.
Yes, composer is optimized for download compressed files and the cache is only optimized for zip/tarballs. json without *.gz is only for legacy support.
Hey!
Geographically? It's in Virginia, USA. Specifically https://wikitech.wikimedia.org/wiki/Eqiad_cluster
It should be running every 5 minutes.
No, it's using your code off of github, I just hadn't pulled it in a while (just did so now).
Probably should've closed this a while back, the mirror has been running for a while and works just fine.
I am looking for new maintainers though, T296968: Add more maintainers for packagist-mirror.