Page MenuHomePhabricator

Move Wikitech onto the production MW cluster
Open, HighPublic

Description

Right now Wikitech runs on wmcs-managed systems in isolation from the main MediaWiki hosting cluster. This "snowflake" deployment leads to both confusion for other MediaWiki support teams and reduced functionality for Wikitech as a wiki.

Now that Wikitech doesn't involve any OpenStack API integration, it can and should move to the standard production wiki cluster.

Note that this does not mean that Wikitech will become an SUL wiki; that will happen later. Wikitech will still create, manage, and consume ldap credentials.

Requirements:

Related Objects

Event Timeline

Reedy updated the task description. (Show Details)
Reedy updated the task description. (Show Details)
bd808 triaged this task as High priority.Feb 28 2020, 12:46 AM
bd808 updated the task description. (Show Details)

I have a very specific concern with this, and it's of isolation.

When MW is down, people still want to reach wikitech. Sure, we have wikitech-static, but for example *search won't work* there if wikitech is down.

So having a separation between the main cluster and wikitech is a welcome fact. What I think we could do is to create a couple of machines managed by the same puppet code as appservers, pointing to different resources though (so, mcrouter points to two memcached servers, etc).

Given the importance wikitech has for troubleshooting documentation, I would like to keep it separated from the main infrastructure as much as possible.

Sure, we have wikitech-static, but for example *search won't work* there if wikitech is down.

If so, that is a bug. The search on wikitech static was broken for a bit (T243730), but that was a simple MySQL table corruption. I don't know of any quantum entanglement between wikitech and wikitech-static outside of the daily data export/import jobs.

Given the importance wikitech has for troubleshooting documentation, I would like to keep it separated from the main infrastructure as much as possible.

The concern above about search is true here though; wikitech uses the main cirrussearch cluster. It is also behind the shared CDN layer. It also uses a database on the m5 cluster. It also lives in the eqiad DC. There is some separation, but that separation is actually a burden and not a benefit.

Features keep disappearing from Wikitech as the main cluster and the Puppet manifests around it are refactored and updated. The fact that we can't even have Visual Editor now feels like a big last straw for me. We can't have VE; we can't have beta features; we can't have Flow; we can't expose the metadata of the wiki on the Wiki Replicas; the bounce handler is broken; etc. The bugs I file about such things are given the broad response of "oops, sorry that only works if you are hooked up to <<infrastructure X>>". So, I believe we have to choose one of:

Of all of these options, moving wikitech seems the least disruptive to me. It also gets us one step closer to my ideal world of T161859: Make Wikitech an SUL wiki which very certainly can't happen until wikitech is running from an isolation zone that can access s7.

Related idea that @Reedy brought up in irc chat: what if we skip the legacy cluster and make wikitech the first MW-on-K8s wiki? That could allow isolation of the php-ldap bits from T237889: Install php-ldap on all MW appservers and would give a low access rate but actively used wiki for indefinite testing of how to connect all the other things up to the Kubernetes cluster deployment. It doens't even need multi-version and the related baggage there. If the deployed version on wikitech lagged the train by minutes, hours, days, or even weeks that would be ok too.

re: isolation -- I'd like us to continue to regard wikitech-static as the backstop for technical docs. If we have concerns about the availability/reliability of wikitech-static then we should list and address those issues

re: wikitech-on-k8s -- if someone wants to take this on I have no objection, but my understanding is that making wikitech a 'normal' wiki is only a small amount of work, and I'd hate to see an ambitious unrelated task stand in the way of that. We could certainly still move wikitech to k8s after the fact if that seems useful.

I just checked the search on wikitech-static and it returned the following error:

[b17175d42b31c3fb3373c2cc] /w/index.php?search=data+persistence&title=Special%3ASearch&go=Go Error: Call to a member function caseFold() on null

Backtrace:

from /srv/mediawiki/w/extensions/TitleKey/TitleKey_body.php(60)
#0 /srv/mediawiki/w/extensions/TitleKey/TitleKey_body.php(232): TitleKey::normalize(string)
#1 /srv/mediawiki/w/extensions/TitleKey/TitleKey_body.php(228): TitleKey::exactMatch(integer, string)
#2 /srv/mediawiki/w/extensions/TitleKey/TitleKey_body.php(215): TitleKey::exactMatchTitle(Title)
#3 /srv/mediawiki/w/includes/HookContainer/HookContainer.php(330): TitleKey::searchGetNearMatch(string, NULL)
#4 /srv/mediawiki/w/includes/HookContainer/HookContainer.php(137): MediaWiki\HookContainer\HookContainer->callLegacyHook(string, array, array, array)
#5 /srv/mediawiki/w/includes/HookContainer/HookRunner.php(3329): MediaWiki\HookContainer\HookContainer->run(string, array)
#6 /srv/mediawiki/w/includes/search/SearchNearMatcher.php(162): MediaWiki\HookContainer\HookRunner->onSearchGetNearMatch(string, NULL)
#7 /srv/mediawiki/w/includes/search/SearchNearMatcher.php(64): SearchNearMatcher->getNearMatchInternal(string)
#8 /srv/mediawiki/w/includes/specials/SpecialSearch.php(341): SearchNearMatcher->getNearMatch(string)
#9 /srv/mediawiki/w/includes/specials/SpecialSearch.php(200): SpecialSearch->goResult(string)
#10 /srv/mediawiki/w/includes/specialpage/SpecialPage.php(646): SpecialSearch->execute(NULL)
#11 /srv/mediawiki/w/includes/specialpage/SpecialPageFactory.php(1382): SpecialPage->run(NULL)
#12 /srv/mediawiki/w/includes/MediaWiki.php(309): MediaWiki\SpecialPage\SpecialPageFactory->executePath(Title, RequestContext)
#13 /srv/mediawiki/w/includes/MediaWiki.php(913): MediaWiki->performRequest()
#14 /srv/mediawiki/w/includes/MediaWiki.php(546): MediaWiki->main()
#15 /srv/mediawiki/w/index.php(53): MediaWiki->run()
#16 /srv/mediawiki/w/index.php(46): wfIndexMain()
#17 {main}

I also get randomly redirected to regular Wikitech when trying to reach Wikitech-static. Couldn't figure out the exact conditions to trigger one or the other.

static needs updating to somewhere more near HEAD of REL1_36...

static needs updating to somewhere more near HEAD of REL1_36...

  • done, although I had to hack around some dependency issues because composer.json asked for some impossible combinations.

Search is still failing. I also see the issue with random redirects but haven't found a fix.

@Reedy, did I ever succeed in getting you the root password for wikitech-static?

@Reedy, did I ever succeed in getting you the root password for wikitech-static?

You did! I just couldn't remember where I'd saved it. I found it again!

https://wikitech.wikimedia.org/w/index.php?search=data+persistence&title=Special%3ASearch&go=Go&ns0=1&ns12=1&ns116=1&ns498=1 now works, I think? The extensions were a bit out of skew with MW, so I've fixed that, and that looks to have helped.

I've tidied up the composer stuff too.

T257643: https://wikitech-static.wikimedia.org/wiki/ redirecting improperly was filed for the redirects before. But no real luck on hammering down

T257643: https://wikitech-static.wikimedia.org/wiki/ redirecting improperly was filed for the redirects before. But no real luck on hammering down

And is probably fixed now

With search and redirects fixed, looks like we should be in a good state to get back to the discussion about making Wikitech a regular wiki.

As a side note, we could use a process to check the state of Wikitech and other rarely used tools that are crucial in incident response. I captured this idea in T290130.

Related idea that @Reedy brought up in irc chat: what if we skip the legacy cluster and make wikitech the first MW-on-K8s wiki? That could allow isolation of the php-ldap bits from T237889: Install php-ldap on all MW appservers and would give a low access rate but actively used wiki for indefinite testing of how to connect all the other things up to the Kubernetes cluster deployment. It doens't even need multi-version and the related baggage there. If the deployed version on wikitech lagged the train by minutes, hours, days, or even weeks that would be ok too.

This is a good idea indeed, something I wanted to propose myself. Not sure how much of a disruption it would be to build another variant of the mediawiki image, but maybe we can include php-ldap in the debug image that also includes the profiler, and just disable the profiler on the wikitech installation.

Once the issues we have with mediawiki on k8s are all ironed out, I'll get back to this.

ebernhardson@mwdebug1002:~/mw-phpdbg$ mwscript shell.php --wiki=labswiki
>>> wfMessage( 'ok' )->text()
LogicException with message 'Process cache for 'en' should be set by now.'

[…] the labswiki-specific memcached not being available from mwmaint*

I ran into this again when running a maintenance script. I've documented it on Maintenance server so that, while people will inevitably try this again, it should reduce the time spent debugging.