The Cloud Services team is in the process of updating and standardizing the use of DNS names throughout Cloud VPS projects and infrastructure, including the Toolforge project. A lot of this has to do with reducing our reliance on the badly-overloaded term 'Labs' in favor of the 'Cloud' naming scheme. The whole story can be found on this Wikitech proposal page. These changes will be trickling out over the coming weeks or months, but one change you might notice already.
Last week, we completed a piece of long-neglected work relating to Puppet, the tool that manages the configuration of every virtual machine in our cloud. Historically, each VM has received its configuration from a physical, production server (the 'puppetmaster'). This meant that there was a constant chatter of traffic back and forth between each VM and unrelated networks and hardware sitting in Wikimedia production. Now, the puppetmasters are located on VMs, so all of that chatter is internal to Cloud Services.
A couple of week ago we finally moved the last lingering VMs in our OpenStack platform from the nova-network region to the Neutron region (Blog Post: Neutron is here!). The bulk of the work had been done a month earlier, so the final nails in nova-network's coffin felt a bit anticlimactic -- nevertheless, this is a big step that represents a huge amount of work on the part of both staff and volunteers.
Ubuntu Trusty was released in April 2014, and support for it (including security updates) will cease in April 2019. We need to shut down all Trusty hosts before the end of support date to ensure that Toolforge remains a secure platform. This migration will take several months because many people still use the Trusty hosts and our users are working on tools in their spare time.
As promised in an earlier post (Blog Post: Neutron is (finally) coming), we've started moving a few projects on our Cloud-VPS service into a new OpenStack region that is using Neutron for its software-defined networking layer. It's going pretty well! The new region, 'eqiad1', is currently very small, and growth is currently blocked by hardware issues (see T199125 for details) but we hope to resolve that issue soon.
Just as we were settling into nova-network (along with other early OpenStack adopters), the core developers were already moving on. A new project (originally named 'Quantum' but eventually renamed 'Neutron') would provide stand-alone APIs, independent from the Nova APIs, to construct all manners of software-defined networks. With every release Neutron became more elaborate and more reliable, and became the standard for networking in new OpenStack clouds.
Between 2017-11-20 and 2017-12-01, the Wikimedia Foundation ran a direct response user survey of registered Toolforge users. 141 email recipients participated in the survey which represents 11% of those who were contacted.
The Foundation fiscal quarter running from January 2018 through March 2018 was a busy one for the Cloud Services team:
I've spent the last few months building new web servers to support some of the basic WMCS web services: Wikitech, Horizon, and Toolsadmin. The new Wikitech service is already up and running; on Wednesday I hope to flip the last switch and move all public Horizon and Toolsadmin traffic to the new servers as well.
Long ago, the Wikimedia Operations team made the decision to phase out use of Ubuntu servers in favor of Debian. It's a long, slow process that is still ongoing, but in production Trusty is running on an ever-shrinking minority of our servers.
One of our quarterly goals was "Define a metric to track OpenStack system availability". Despite the weak phrasing, we elected to not only pick something to measure but also to actually measure it.
The current physical servers for the <wiki>_p Wiki Replica databases are at the end of their useful life. Work started over a year ago on a project involving the DBA team and cloud-services-team to replace these aging servers (T140788). Besides being five years old, the current servers have other issues that the DBA team took this opportunity to fix:
- Data drift from production (T138967)
- No way to give different levels of service for realtime applications vs analytics queries
- No automatic failover to another server when one failed
24% of Wikipedia edits over a three month period in 2016 were completed by software hosted in Cloud Services projects. In the same time period, 3.8 billion Action API requests were made from Cloud Services. We are the newly formed Cloud Services team at the Foundation, which maintains a stable and efficient public cloud hosting platform for technical projects relevant to the Wikimedia movement. -- https://blog.wikimedia.org/2017/09/11/introducing-wikimedia-cloud-services/
Back in year zero of Wikimedia Labs, shockingly many services were confined to a single box. A server named 'virt0' hosted the Wikitech website, Keystone, Glance, Ldap, Rabbitmq, ran a puppetmaster, and did a bunch of other things.
(reposted with minor edits from https://lists.wikimedia.org/pipermail/labs-l/2017-July/005036.html)
Debian Stretch was officially released on Saturday, and I've built a new Stretch base image for VPS use in the WMF cloud. All projects should now see an image type of 'debian-9.0-stretch' available when creating new instances.
Back in the dark ages of Labs, all instance puppet configuration was handled using the puppet ldap backend. Each instance had a big record in ldap that handled DNS, puppet classes, puppet variables, etc. It was a bit clunky, but this monolithic setup allowed @yuvipanda to throw together a simple but very useful tool, 'watroles'. Watroles answered two questions:
The first very visible step in the plan to rename things away from the term 'labs' happened around 2017-06-05 15:00Z when IRC admins made the #wikimedia-labs irc channel on Freenode invite-only and setup an automatic redirect to the new #wikimedia-cloud channel.
When @Ryan_Lane first built OpenStackManager and Wikitech, one of the first features he added was an interface to setup project-wide sudo policies via ldap.
For nearly a year, Horizon has supported instance management. It is altogether a better tool than the Special:NovaInstance page on Wikitech -- Horizon provides more useful status information for VMs, and has much better configuration management (for example changing security groups for already-running instances.)
I've just installed a new public base image, ' debian-9.0-stretch (experimental)' and made it available for all projects. It should appear in the standard 'Source' UI in Horizon any time you create a new VM.
Andrew will be upgrading our Openstack install from version 'Kilo' to version 'Liberty' on Tuesday the 2nd. The upgrade is scheduled to take up to three hours. Here's what to expect:
If you are exclusively a user of tool labs, you can ignore this post. If you use or administer another labs project, this REQUIRES ACTION ON YOUR PART.
The Kubernetes ('k8s') backend for Tool Labs webservices is open to
beta testers from the community as a replacment for Grid Engine
horizon (OpenStack Dashboard) is the canonical implementation of OpenStack’s Dashboard, which provides a web based user interface to OpenStack services.
tools-login.wmflabs.org is on a new bastion host with twice
the RAM and CPU of the old one. This should hopefully provide a better
bandaid against it getting overloaded up. More discussion about a
longer term solution at https://phabricator.wikimedia.org/T131541
On Tuesday, 2016-04-05, we'll be upgrading Kubernetes to 1.2 and using
a different deployment method as well. While this should have no user
facing impact (ideally!) the following things might be flaky for a
period of time on that day: