Page MenuHomePhabricator

⬆️ Spike: Move DB creation out of the Platform API and into MediaWiki
Closed, ResolvedPublic

Description

We dump the DB schema from a running MW instance and give it to the Platform API so it can keep a handful of DB preconfigured for when a new Wiki is created. Each time we do a MW update, we have to run a new MW instance and dump the DB schema again.

Docs on how this is currently done: https://github.com/wbstack/api/tree/main/database/mw

This is a bit backwards given that MW already knows the DB schema and can create new DB instances. Investigate moving the creation of the database into MediaWiki will make upgrades simpler and reduce the number of repos need modifying. Figure out what the process would be:

A/C:

  • summarise how this would work in text or create a PoC patchset

Initial Timebox: 16 hours
Remaining Time: 0 hours

Event Timeline

Tarrow added a subscriber: Andrew-WMDE.
Ollie.Shotton_WMDE renamed this task from Spike: Move DB creation out of api and into Mediawiki to Spike: Move DB creation out of the Platform API and into MediaWiki.Jul 29 2025, 10:37 AM
Ollie.Shotton_WMDE renamed this task from Spike: Move DB creation out of the Platform API and into MediaWiki to ⬆️ Spike: Move DB creation out of the Platform API and into MediaWiki.Aug 13 2025, 1:00 PM

This is a bit backwards given that MW already knows the DB schema and can create new DB instances.

This is I believe technically incorrect.
Currently the mediawiki-db-manager SQL user is the one that creates tables (since tihs was all wbstack.com)
https://github.com/search?q=repo%3Awmde%2Fwbaas-deploy%20mediawiki-db-manager&type=code
I see there are some other users etc now? so I might be missing something, but looking at the search this is still my belief?

Mediawiki only has access to the individual per wiki credentials, which the mediawiki-db-manager creates for them.

Essentially the idea behind this back in the day was that there was a "mediawiki-manager" service, which dealt with managing the mediawiki service, application etc.
This just so happens to have been implemented as part of the platform API.
This allowed for seperation of the mediawiki application, and the thing that was managing it.
Weather this is actually important with the way things are structured now, thats for yall to determine!

Moving the creation of the database into MediaWiki will make upgrades simpler and reduce the number of repos need modifying.

Another alternative to the goal of "making the updates easier" is to have the SQL be built automatically in CI rather than having the manual steps.


https://www.wbstack.com/tech/decisions/0005-backend-apis-and-services.html#backend-wiki-service gives some context on this boundary

Though since the early days, more of the logic for things this backend service does has moved into MediaWiki via private internal only API calls etc. (and or maint scripts)?
This also mostly relies on the fact that the wiki is already setup and configured?

Back in the day, the backend service could have just called install.php, and that is also potentially still an option, rather than doing the manual schema steps.
The reason behind the decision to not do that years back, is that install.php was seen as flakey, and seemingly running install.php would make it more easy to end up with varying schemas between wikis.
Starting with a hardcoded set of SQL / tables each site would elevate both of those points.

However install.php I believe has gotten better these days.
Potentially a new pattern is the API creating the DB credentials (keeping seperationg of SQL user access), and delegating to install.php to just make the thing :)?

Tarrow updated the task description. (Show Details)

rosalieper opened https://github.com/wmde/wbaas-deploy/pull/2296

[DNM] Job for creating wiki dbs using mediawiki install.php

The patch shows that we can create new wikis using kubernetes jobs. The job runs a mediaiwki container that executes install.php. It create a new database with the appropriate schema and install the wiki using given parameters.
I made the job in a different namespace for the sake of poc.

  • If we go wit the Kubernetes job approach we will have the platform api triggering the job. there are some community maintained k8s php client api. we could use that and have the Platform api set up or call the job, but we still have the benefit of mediawiki creating the dbs itself and reduce the complexity of mediawiki updates. Only downside i see is that every time a new wiki is created we have to create a temporal mediawiki container within the job to do this.
  • if we follow the approach of an internal endpoint say some createWikiHandler.php which will be responsible for running the install.php and calling GlobalSet.php to register the wiki, we probably will have a cleaner achitecture as the platform api would just make an api call and everything else would be handles by mediawiki. unfortunately i wasn't able to come up with a poc for this within the timebox limit.
  • An alternative would be to have it done through an api call and a mediawiki hook that will serve as the bridge between the platform api and the mediawiki installation. The hook listens for the api call and connect to the existing DB pod, and run mediawiki installations using install.php script`. But am also not sure if this will be easily testable.

Thanks for looking into this and the interesting thoughts.

We do already call the kubernetes API from the platform API and we use this "maglof" client; you can see this, for example, in ProcessMediaWikiJobsJob where we create a kubernetes Job from a MW job. Although I believe we think that this client might not be the best one I don't think this ticket it the place to reconsider this decision.

Only downside i see is that every time a new wiki is created we have to create a temporal mediawiki container within the job to do this.

I don't really think this is a big downside at all; containers are cheap; jobs are cheap etc.. I think for me the larger downside is that it might take longer. If we don't want to "pre-cook databases" and instead we call this job at Wiki creation time that may slow down the user experience for creating a Wiki.

The other downside that @Addshore mentions is that *if* install.php is flaky or not deterministic then we'll have an interesting situation that won't be observed by the engineer manually generating the SQL files.

The reason behind the decision to not do that years back, is that install.php was seen as flakey, and seemingly running install.php would make it more easy to end up with varying schemas between wikis.

Am I right in thinking that you didn't see install.php behave non-deterministically @Rosalie_WMDE ?

I think also the POC *may*be missing trying this with our mediawiki image? Did I get it right that you used the upstream mediawiki image and not the wbstack one? I saw in the docs that normally we find running install.php a bit of a faff?

The patch shows that we can create new wikis using kubernetes jobs. The job runs a mediaiwki container that executes install.php. It create a new database with the appropriate schema and install the wiki using given parameters.

It looks like the job uses the root database credentials, which probably isn't a good idea from a security point of view. Was this done just to make the PoC easier? What do you think about @Addshore's suggestion?

Potentially a new pattern is the API creating the DB credentials (keeping seperationg of SQL user access), and delegating to install.php to just make the thing :)?


Only downside i see is that every time a new wiki is created we have to create a temporal mediawiki container within the job to do this.

I don't really think this is a big downside at all; containers are cheap; jobs are cheap etc.. I think for me the larger downside is that it might take longer. If we don't want to "pre-cook databases" and instead we call this job at Wiki creation time that may slow down the user experience for creating a Wiki.

+1, I don't think using a k8s job is a downside. Is there anything stopping us from running this k8s job ahead of Wiki creation time?


  • An alternative would be to have it done through an api call and a mediawiki hook that will serve as the bridge between the platform api and the mediawiki installation. The hook listens for the api call and connect to the existing DB pod, and run mediawiki installations using install.php script`. But am also not sure if this will be easily testable.

I don't think I understand this fully, so I'm probably misunderstanding something. What is the purpose of the hook? Can't the API handler "connect to the existing DB pod, and run mediawiki installations using the install.php script" (this seems like what you are suggesting in your 2nd bullet point, though)?

I don't think mediawiki hooks can listen for an API call. AIUI, hooks are triggered by MediaWiki (or extension) code. For example, the LocalUserCreated hook is called by MediaWiki code "immediately after a local user has been created and saved to the database". When that hook is called, the hook handler will call any method that was registered with it. AFAIK, there is no hook for when an API call is received, so this is probably not the right mechanism to use.

Is there anything stopping us from running this k8s job ahead of Wiki creation time?

I don't think there is.

@dena and I discussed this

To summarise:

  • if we want to continue in this direction more investigation is required because we still need to answer:
    • is install.phpgoing to to regularly fail to run for some reason
    • is install.php deterministic enough even if it does run
    • it is possible to find a configuration (what should LocalSettings.php look like) where it successfully runs in one shot as opposed to the complex steps described in the README ?
  • continuing this work right now seems a distraction from our immediate focus of actually doing the upgrade to 1.43 because we have now got a working DB schema. This will be more relevant for the next upgrade
  • it is possible to install MediaWiki with a kubernetes job
  • it seems clear that we'd want to ensure that install.php is only run with credentials for one MW db and not some super powerful credentials. This means we would still need a db and user creation step to run in the api

I will make another ticket, in the backlog, that we shan't pick up immediately but so we have some starting point for perhaps reopening this investigation before the next update

With the new ticket made I think this is ready to move to done. I had one follow up that I will add the to new ticket.

dena renamed this task from ⬆️ Spike: Move DB creation out of the Platform API and into MediaWiki to ⬆️ 🚦Spike: Move DB creation out of the Platform API and into MediaWiki.Nov 7 2025, 8:59 AM
dena renamed this task from ⬆️ 🚦Spike: Move DB creation out of the Platform API and into MediaWiki to ⬆️ Spike: Move DB creation out of the Platform API and into MediaWiki.Nov 10 2025, 1:29 PM

rosalieper closed https://github.com/wmde/wbaas-deploy/pull/2296

[DNM] Job for creating wiki dbs using mediawiki install.php