Page MenuHomePhabricator

⬆️ [Spike] Routing MW backend traffic based on MediaWiki version
Closed, ResolvedPublic

Description

Decide whether, and if so, how to route traffic from Quick Statements, Widar, the Platform API and queryservice-updater to the appropriate MediaWiki back-end.

See @Tarrow's comment: wmde/wbaas-deploy/pull/2275#issuecomment-3410767848

Relevant config in wbaas-deploy:

A/C

  • Summary of if its possible to route traffic for Quick Statements, Widar, the Platform API and queryservice-updater to the appropriate MediaWiki back-end.
  • Summary of potential ways to route the traffic (if its possible)
  • Both summary should include perceived level of effort for the work (i.e. story points)
  • If known, summary should include potential increase to costs
  • PM is informed of the results of the investigation

Note:

  • If its not possible, please stop the timebox early
  • Should the solution be manageable in the inital timebox, feel free to implement the solution

Initial Timebox: 16
Remaining Timebox: 8

Event Timeline

Tarrow renamed this task from ⬆️ Route tool and platform API traffic based on MediaWiki version to ⬆️ Routing MW backend traffic based on MediaWiki version.Oct 20 2025, 1:52 PM
Tarrow updated the task description. (Show Details)
karapayneWMDE renamed this task from ⬆️ Routing MW backend traffic based on MediaWiki version to ⬆️ [Spike] Routing MW backend traffic based on MediaWiki version.Oct 20 2025, 1:59 PM
karapayneWMDE updated the task description. (Show Details)

The general principle I would have expected to find is that in most cases 1.39 MW pod could access a 1.43 DB even after update.php is run and will mostly operate successfully.
It's most likely any additional tables/columns were additive not destructive so probably most stuff will still work as expected.

However from some rough inspection this is not true with https://www.mediawiki.org/wiki/Manual:Ipblocks_table which seems to be the cause of OAuth not working concurrently on both 1.43 and 1.39. e.g. I see errors like:

[DBQuery] Error 1146 from MediaWiki\Block\DatabaseBlock::newLoad, Table 'mwdb_f036984f01.mwt_93f623752b_ipblocks' doesn't exist SELECT  ipb_id,ipb_address,ipb_timestamp,ipb_auto,ipb_anon_only,ipb_create_account,ipb_enable_autoblock,ipb_expiry,ipb_deleted,ipb_block_email,ipb_allow_usertalk,ipb_parent_block_id,ipb_sitewide,ipb_by_actor,ipblocks_actor.actor_user AS `ipb_by`,ipblocks_actor.actor_name AS `ipb_by_text`,comment_ipb_reason.comment_text AS `ipb_reason_text`,comment_ipb_reason.comment_data AS `ipb_reason_data`,comment_ipb_reason.comment_id AS `ipb_reason_cid`  FROM `mwt_93f623752b_ipblocks` JOIN `mwt_93f623752b_actor` `ipblocks_actor` ON ((actor_id=ipb_by_actor)) JOIN `mwt_93f623752b_comment` `comment_ipb_reason` ON ((comment_ipb_reason.comment_id = ipb_reason_id))   WHERE ipb_address = '10.112.5.47' OR ((ipb_range_start  LIKE '0A70%' ESCAPE '`' ) AND (ipb_range_start <= '0A70052F') AND (ipb_range_end >= '0A70052F'))   sql-mariadb-secondary.default.svc.cluster.local
#0 /var/www/html/w/includes/libs/rdbms/database/Database.php(1576): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
#1 /var/www/html/w/includes/libs/rdbms/database/Database.php(952): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
#2 /var/www/html/w/includes/libs/rdbms/database/Database.php(1711): Wikimedia\Rdbms\Database->query(string, string, integer)
#3 /var/www/html/w/includes/libs/rdbms/database/DBConnRef.php(103): Wikimedia\Rdbms\Database->select(array, array, string, string, array, array)
#4 /var/www/html/w/includes/libs/rdbms/database/DBConnRef.php(326): Wikimedia\Rdbms\DBConnRef->__call(string, array)
#5 /var/www/html/w/includes/block/DatabaseBlock.php(291): Wikimedia\Rdbms\DBConnRef->select(array, array, string, string, array, array)
#6 /var/www/html/w/includes/block/DatabaseBlock.php(887): MediaWiki\Block\DatabaseBlock::newLoad(User, integer, boolean, string)
#7 /var/www/html/w/includes/block/BlockManager.php(161): MediaWiki\Block\DatabaseBlock::newListFromTarget(User, string, boolean)
#8 /var/www/html/w/includes/user/User.php(1434): MediaWiki\Block\BlockManager->getUserBlock(User, WebRequest, boolean, boolean)
#9 /var/www/html/w/includes/user/User.php(1532): User->getBlockedStatus(boolean, boolean)
#10 /var/www/html/w/includes/block/BlockManager.php(552): User->getBlock()
#11 /var/www/html/w/includes/MediaWiki.php(780): MediaWiki\Block\BlockManager->trackBlockWithCookie(User, WebResponse)
#12 /var/www/html/w/includes/api/ApiMain.php(905): MediaWiki::preOutputCommit(DerivativeContext)
#13 /var/www/html/w/includes/api/ApiMain.php(850): ApiMain->executeActionWithErrorHandling()
#14 /var/www/html/w/api.php(94): ApiMain->execute()
#15 /var/www/html/w/api.php(49): wfApiMain()
#16 {main}

When running 1.39 pods against a 1.43 MediaWiki.

Possible services that access backend traffic to consider and some notes that jumped to my mind:

  • queryservice updater
    • this would only be necessary if the Special:EntityData endpoints do not work on a mismatched DB<->MW version
    • this is likely if the new db version includes something like mul
    • perhaps the easiest solution is then a parallel fleet of updaters wired to request only updates from wikis on one version or the other and load data from that respective version
  • dispatched jobs
    • this seems likely to be important
    • TODO: find docs on how dispatched jobs work on cloud
  • tools talking to a backend
    • Worried about OAuth
    • unclear what would happen to (e.g. edits)
  • api talking to mediawiki backend
    • already knows about which wiki is running on which DB version so no proxy needed
Tarrow moved this task from Doing to To do on the Wikibase Cloud (Kanban Board) board.

Trying to wrap my head around this. Any feedback welcome on these thoughts.

  • This makes me think it could make sense to only choose one implementation to do this, even for the Platform API, to reduce burden of maintenance
  • To DRY things out even further I wonder if we just can use one proxy host, which proxies requests to the correct mediawiki pod, and feed this proxy hostname to everywhere (see first bullet point)
    • could/should we re-use the platform-nginx for this proxying?

This makes me think it could make sense to only choose one implementation to do this, even for the Platform API, to reduce burden of maintenance

I haven't got a strong opinion but if the only exception would be the platform API, particularly if it sometimes needs to know which version to do things on (e.g. during update processes) then it might make sense to have them work in different ways. (i.e. WET )

could/should we re-use the platform-nginx for this proxying?

I did consider this a bit but I personally thought this was a bit scary since this would then be a route for public traffic to get to the "magic" backend.

@Tarrow Good points, thanks for the input!

@Ollie.Shotton_WMDE, @dang and me just discussed this a bit and want to see if it works out to:

  1. Refactor the Platform API where the env var gets used to a version lookup
  2. Use a proxy for everything else (the tools) - with attention to not make it publicly accessible

The PLATFORM_MW_BACKEND_HOST env var is used in Laravel Jobs and one Command in the Platform API code.

Jobs:

Command:

  • RebuildQueryserviceData

I'll create a PoC helper similar to IngressController::getWikiVersionForDomain and use it in one of the jobs.

Currently, PLATFORM_MW_BACKEND_HOST is either mediawiki-139-app-backend.default.svc.cluster.local or mediawiki-143-app-backend.default.svc.cluster.local. The version in the WikiDb model is either mw1.39-wbs1 or mw1.43-wbs1. So instead of being passed a single PLATFORM_MW_BACKEND_HOST env var, the Platform API needs a map of database version to mediawiki backend host (I think it would be bad practice to hardcode this mapping).

This could be an env var that contains some JSON, but this feels a bit hacky. What's the k8s way of doing this? Use a ConfigMap to mount a JSON/YAML file with this mapping?

Currently, PLATFORM_MW_BACKEND_HOST is either mediawiki-139-app-backend.default.svc.cluster.local or mediawiki-143-app-backend.default.svc.cluster.local. The version in the WikiDb model is either mw1.39-wbs1 or mw1.43-wbs1. So instead of being passed a single PLATFORM_MW_BACKEND_HOST env var, the Platform API needs a map of database version to mediawiki backend host (I think it would be bad practice to hardcode this mapping).

This could be an env var that contains some JSON, but this feels a bit hacky. What's the k8s way of doing this? Use a ConfigMap to mount a JSON/YAML file with this mapping?

So far we put application config in here mostly I think: https://github.com/wbstack/api/blob/main/config/wbstack.php

Personally it'd be comfortable if the mapping logic would live in it's own Helper class or something similar.

I think so far the mapping only lived in the platform-nginx config, maybe we could even benefit there if the IngressController would also already do the mapping, so we only map on the Platform API side.

Thanks for that info.

Personally it'd be comfortable if the mapping logic would live in it's own Helper class or something similar.

That would mean we have to update the Platform API code before we can do a MediaWiki update. This is probably okay for now, but if we want to reduce the steps required to do a MW update then I think this should be some sort of config that can be updated from wbaas-deploy rather than hard coded in the Platform API.

I think so far the mapping only lived in the platform-nginx config, maybe we could even benefit there if the IngressController would also already do the mapping, so we only map on the Platform API side.

Having the mapping in a single place seems sensible - I'll create a follow-up task to look into this if the mapping does move to the Platform API.

I pushed a draft PR with what I tried so far: https://github.com/wmde/wbaas-deploy/pull/2295
I basically just copied the platform-nginx deployment so I can observe what happens when I point tools to it. I get a gate way timeout after 3 minutes but I don't even see anything in the logs (neither of the proxys nor the tools pod logs).

Also it seems PLATFORM_MW_BACKEND_HOST gets only used via magnustools in widar and quickstatements: https://github.com/wbstack/magnustools/blob/e3ddd872410e3d25cec63c5d646ba8cdd9a813ae/public_html/php/WbstackMagnusOauth.php#L107
and I kind of wonder if we can just use the wiki domain here, so we can reuse the routing that is already in place?

I kind of wonder if we can just use the wiki domain here, so we can reuse the routing that is already in place?

I don't think so; remember this needs to contact the special, magic non-public endpoint (the "backend" one) to get the OAuth consumer for the tool to use

I kind of wonder if we can just use the wiki domain here, so we can reuse the routing that is already in place?

I don't think so; remember this needs to contact the special, magic non-public endpoint (the "backend" one) to get the OAuth consumer for the tool to use

You seem to be right, but I don't understand why :P Isn't the point of OAuth that it can happen across two different sites? But yeah I quickly tried that approach and it didn't work, so I didn't look into it much further.

On a slightly different note though, I had the idea that it could be possible to query the backend API like nginx does, and it works:

root@tool-quickstatements-6fb4655bf5-fkwrr:/var/www/html# curl -v http://api-app-backend.default.svc.cluster.local/backend/ingress/getWikiVersionForDomain?domain=test-139.wbaas.dev
*   Trying 10.104.148.86:80...
* Connected to api-app-backend.default.svc.cluster.local (10.104.148.86) port 80 (#0)
> GET /backend/ingress/getWikiVersionForDomain?domain=test-139.wbaas.dev HTTP/1.1
> Host: api-app-backend.default.svc.cluster.local
> User-Agent: curl/7.88.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Fri, 31 Oct 2025 11:56:17 GMT
< Server: Apache/2.4.65 (Debian)
< X-Powered-By: PHP/8.2.29
< Cache-Control: no-cache, private
< x-version: mw1.39-wbs1
< Vary: Accept-Encoding
< Transfer-Encoding: chunked
< Content-Type: text/html; charset=UTF-8
< 
* Connection #0 to host api-app-backend.default.svc.cluster.local left intact

So we also could add some logic to magnustools so it reaches for the correct mediawiki backend host. I created PoC PRs here:

(the latter one is ready to test with skaffold, it just uses the other magnustools PR branch)

image.png (177×1 px, 65 KB)

image.png (177×1 px, 75 KB)

Trying to summarize our outcomes of looking into this last week:

Summary of if its possible to route traffic for Quick Statements, Widar, the Platform API and queryservice-updater to the appropriate MediaWiki back-end.
Summary of potential ways to route the traffic (if its possible)

  • In general the idea of a proxy came up (or reusing the platform-nginx deployment), but this attempt wasn't successfully tested: https://github.com/wmde/wbaas-deploy/pull/2295/files
  • For the Platform API it was decided to do the lookup internally, as outlined here: https://github.com/wbstack/api/pull/987
  • Regarding queryservice-updater, tom mentioned "perhaps the easiest solution is then a parallel fleet of updaters wired to request only updates from wikis on one version or the other and load data from that respective version"

Both summary should include perceived level of effort for the work (i.e. story points)

  • Approach for magnustools-related components would be ready for engineering review (story points guessed: 1-2)
  • Approach for Platform API would possibly need a bit more integration work done (story points guessed: 3-5)
  • Approach for queryservice-updater not started (story points guessed: 5-8)

If known, summary should include potential increase to costs

Possibly no increase in resources needed

PM is informed of the results of the investigation

ping @Anton.Kokh

We should move this to done once we have made three new tickets:

  • Approach for magnustools-related components would be ready for engineering review (T409064)
  • Approach for Platform API (T409085)
  • Another investigation task, more tightly scoped to just the queryservice-updater situation (T409087)
Tarrow claimed this task.