Running red-queen-style
A thousand changes, almost none of them visible

I've spent the last few months building new web servers to support some of the basic WMCS web services: Wikitech, Horizon, and Toolsadmin. The new Wikitech service is already up and running; on Wednesday I hope to flip the last switch and move all public Horizon and Toolsadmin traffic to the new servers as well.

If everything goes as planned, users will barely notice this change at all.

This is a lot of what our team does -- running as fast as we can just to stay in place. Software doesn't last forever -- it takes a lot of effort just to hold things together. Here are some of the problems that this rebuild is solving:

  • T186288: Operating System obsolescence. Years ago, the Wikimedia Foundation Operations team resolved to move all of our infrastructure from Ubuntu to Debian Linux. Ubuntu Trusty will stop receiving security upgrades in about a year, so we have to stop using it by then. All three services (Wikitech, Horizon, Toolsadmin) were running on Ubuntu servers; Wikitech was the last of the Foundation's MediaWiki hosts to run on Ubuntu, so its upgrade should allow for all kinds of special cases to be ignored in the future.
  • T98813: Keeping up with PHP and HHVM. In addition to being the last wiki on Trusty, Wikitech was also the last wiki on PHP 5. Every other wiki is using HHVM and, with the death of the old Wikitech, we can finally stop supporting PHP 5 internally. Better yet, this plays a part in unblocking the entire MediaWiki ecosystem (T172165) as newer versions of MediaWiki standardize on HHVM or PHP 7.
  • T168559: Escaping failing hardware. The old Wikitech site was hosted on a machine named 'Silver'. Hardware wears out, and Silver is pretty old. The last few times I've rebooted it, it's required a bit of nudging to bring it back up. If it powered down today, it would probably come back, but it might not. As of today's switchover, that scenario won't result in weeks of Wikitech downtime.
  • T169099: Tracking OpenStack upgrades. OpenStack (the software project that includes Horizon and most of our virtual machine infrastructure) releases a new version every six months. Ubuntu packages up every version with all of its dependencies, and provides a clear upgrade path between versions. Debian, for the most part, does not. The new release of Horizon is no longer deployed through an upstream package at all, but instead is a pure Python deploy starting with the raw Horizon source and requirements list, rolled into Wheels and deployed into an isolated virtual environment. It's unclear exactly how we'll transition our other OpenStack components away from Ubuntu, but this Horizon deploy provides a potential model for deploying any OpenStack project, any version, on any OS. Having done this I'm much less worried about our reliance on often-fickle upstream packagers.
  • T187506: High availability. The old versions of these web services were hosted on single servers. Any maintenance or hardware downtime meant that the websites were gone for the duration. Now we have a pair of servers with a shared cache, behind a load-balancer. If either of the servers dies (or, more likely, we need to reboot one for kernel updates) the website will remain up and responsive.

Of course, having just moved wikitech to HHVM, the main Wikimedia cluster is being upgraded from HHVM to PHP 7, and Wikitech will soon follow suit. The websites look the same, but the race never ends.

Written by Andrew on Mar 9 2018, 11:42 PM.

Event Timeline

This is a clear picture of the never-ending and almost entirely unenviable aspect of operations, which, IMO, doesn't get nearly enough appreciation. Like everything in the known universe - there is this constant battle against entropy just to maintain the base platform that everyone usually takes for granted.

Kudos to you, @Andrew. The effort is appreciated.

This comment was removed by bd808.