Problem
Beta is a cluster of 72 virtual machines running on Wikimedia Cloud Services, and it’s the closest thing we have to a production-like staging. But it’s not very production-like, and it’s not maintainable.
What does the future look like if this is achieved?
- We run the mainline branch of MediaWiki, extensions, and skins in a subset of our production infrastructure.
- After code is merged, it gets automatically deployed to a subset of production.
- It’s linked to production data, and production wikis, but is only accessible via specially crafted requests (e.g., via X-Wikimedia-Debug headers).
What happens if we do nothing?
We built about half of the virtual machines in the Beta Cluster (30/72) so long ago that their operating system no longer receives security updates. If we do nothing, our only pre-production environment will have half its instances removed and won’t work anymore.
🧐 Why?
- Beta cluster is valuable as a pre-production environment—it’s a place where developers can test the mainline branch of their code.
- In a 2018 survey, roughly 94% of developers agreed they use Beta for at least some of their testing
- Two-thirds of developers rely on it to make informed decisions about deployment
- Beta is not maintainable in its current state
- We’ve temporarily dedicated resources to “fixing” Beta, in the past They ‘ve always managed to make improvements, but the problem of maintaining Beta is only growing over time.
- Beta cluster has been up for code stewardship since 2019 (T215217)
Beta tracks our production infrastructure and our production environment only grows.
Growth of the operations/puppet Repository* | |||
year | Files | **Lines of code | Collaborative Cost Model — People Required to Build**** |
2016 | 2,732 | 139,430 | 20.5 |
2017 | 3,708 (+976) | 185,394 (+45,964) | 24.6 (+4.1) |
2018 | 4,709 (+1001) | 234,510 (+49,116) | 28.7 (+4.1) |
2019 | 5,421 (+712) | 262,015 (+27,505) | 30.9 (+2.2) |
2020 | 6,025 (+604) | 308,442 (+46,427) | 34.3 (+3.4) |
2021 | 6,349 (+324) | 355,107 (+46,664) | 37.6 (+3.3) |
Avg Change | +723 Files/Year | +43,135 LoC/Year | +3.4 People/Year |
*SLOC count via: https://github.com/boyter/scc
**COCOMO: Using the organic model https://en.wikipedia.org/wiki/COCOMO
You should view the actual numbers above with skepticism. But the trend is evident: our production environment grows every year.
We haven’t added resources to Beta, and we’ve burned out the resources we’ve assigned as they struggle to keep up with change.