Thumbor support for private wikis deployed
The importance of the migration long tail

Yesterday we deployed Thumbor support for Wikimedia-hosted private wikis. While 99.9% of our traffic is for public-facing wikis, the Wikimedia Foundation hosts a number of private MediaWiki instances on the same infrastructure. Those wikis facilitate work for various groups in the movement, from community-run projects like OTRS, to local chapters, staff or the board. They're essential to the Wikimedia Movement, but by being private they're an architectural special case.

When we migrated all public thumbnail traffic to using Thumbor as the rendering backend last June, it would have been easy to claim the job done and move onto something else, turning a blind eye to the special case of private wikis. But their different setup meant that they were still using the MediaWiki-based thumbnailing cluster. A clear waste of resources to have a whole (reduced, but still multi-machine) cluster dedicated to a special case representing so little traffic. And more importantly, it meant that for tasks like security work or software upgrades, we would have two clusters to care about for image processing, the new Thumbor one and the legacy MediaWiki image scaling. With very different testing involved for each.

What makes thumbnailing different for private wikis, is that like any content on them, images are meant to be only viewed by people with access to those wikis. For public wikis, authentication isn't required, and that's what lets us have a more streamlined stack that doesn't hit MediaWiki. Public wiki thumbnails are highly cached in Varnish. For private wikis, MediaWiki's authentication acts as the gatekeeper to let a client view a thumbnail. Varnish doesn't cache the thumbnails of private wikis, and merely forwards the request to MediaWiki.

With the new system deployed yesterday, when MediaWiki receives such requests for a new thumbnail on a private wiki, instead of rendering it like it used to, it proxies the request to the same Thumbor cluster used by public wikis, which takes care of the rendering. Some additional gatekeeping is in place in Thumbor to ensure that requests coming from the public wiki pipeline cannot access images that belong to private wikis. Essentially, rendering is now centralized on the single Thumbor cluster, which takes care of both worlds, while still keeping Thumbor decoupled from MediaWiki authentication (since for security reasons, we don't want Thumbor to interact with MediaWiki databases).

Bar any unforeseen issues while we keep an eye on potential bugs in the coming months, we will most likely retire the MediaWiki-based image scaling cluster this year, therefore truly concluding the migration of all our thumbnail rendering across our entire infrastructure to Thumbor.

Sometimes it takes a lot of extra work to tackle those special cases, which can feel like a chore after having switched 99.9% of the traffic already. But the cost of keeping a legacy system running for a special case cannot be overlooked. Beyond keeping a cluster of mostly idle machines in two data centers, the duplicated work of maintaining things is also expensive and never really quantified. Reaching true completion and decommissioning a legacy cluster feels great, though, it's really worth putting in the extra effort!

Written by Gilles on Thu, Feb 22, 10:34 AM.
Senior Performance Engineer, WMF
"Goat" token, awarded by bd808.