Page MenuHomePhabricator

PHP-based alternative to wikimedia/service-template-node
Open, LowPublic

Description

In the self-service deployment world of WikiKube, it's helpful to have a starter kit for product/feature engineering teams. For the last ~8 years, that starter kit has been wikimedia/service-template-node.

From my perspective, this has two main drawbacks:

  • It's based on Node.js and its ecosystem, which is not a stack that we assess in hiring product/feature engineer staff. Consequently we don't have a lot of in-house expertise on which packages to use, application setup, dependency update practices, testing setup, etc
  • Perhaps due to lack of in-house expertise, we haven't been making many improvements/fixes to wikimedia/service-template-node for several years. To give one example, it has dependencies on deprecated projects, and in other cases is using deprecated dependencies released ~5 years ago. This makes it hard/impossible to run tools like osv-scanner on apps that build on top of wikimedia/service-template-node.

Hypotheses for PHP-based framework

  • because we have a lot of PHP experience in-house, teams will be much more productive in building new applications to deploy to Kuberentes
  • if we have a PHP-based framework to point teams to, we'll have a better chance of maintaining a shared framework for services that feature teams deploy to the Kubernetes cluster.

We'd also be able to reuse components from MediaWiki. wikimedia/ToolforgeBundle is an example of what I'm talking about, though I think we'd want something more specifically tailored to API use cases.

Drawbacks / Critiques

  • There's no guarantee we'd end up with a maintained framework. That requires resourcing. I think we'd have a better likelihood of it due to in-house familiarity with PHP, though.
  • A common use case for deploying applications to Kubernetes is to integrate with a machine learning model, and that will end up using Python code and libraries. We also don't (as far as I know) have a consensus or common framework for shipping Python applications. Maybe that is more important to focus on than a PHP-based distribution.
  • @Tgr: "A possible drawback is that PHP's shared-nothing architecture makes the handling of slow setup steps (e.g. pulling config from a MediaWiki API) complicated. (There are threaded / event looping PHP frameworks these days, though.)"
  • Deploying a PHP app in Kubernetes would also require a web server container (Apache, Nginx) which increases deployment complexity somewhat

Event Timeline

kostajh updated the task description. (Show Details)

This would be great! Besides more in-house expertise and less language fragmentation, it would also be able to reuse Librarization work (e.g. T225762: Librarize the MediaWiki REST API framework if it happens).

There was an unofficial template based on Laravel in the past, although I'm not sure it was a proper template or just everyone copying the same application (iegreview or scholarships, don't remember which was first). @bd808 would probably know more.

A possible drawback is that PHP's shared-nothing architecture makes the handling of slow setup steps (e.g. pulling config from a MediaWiki API) complicated.
(There are threaded / event looping PHP frameworks these days, though.)

In addition to the linked service-template-node, there is also service-scaffold-node and related servicelib-node. The idea being, as I understand it, that service-template-node, while a great start, encouraged more cut-and-paste style codiing, and it was difficult to update any service created on it to take advantage of improvements at the template level. So the scaffolding approach would give a more lightweight "starter kit" that used things from the servicelib. Then as improvements were made to the servicelib, they could be directly (or at least more easily) incorporated into services that used them. To be clear, the idea I heard was that the scaffolding approach would replace the template approach, and therefore it wouldn't be surprising if the template repository wasn't being updated. With that said, we're not necessarily right on top of updates to the scaffolding either.

We also have somewhat formative mirror projects for golang: service-scaffold-golang and servicelib-golang. These roughly capture the state of Kask, which has been very successful for MediaWiki session storage. We used the golang repositories when creating the six AQS 2.0 services I'll link Device Analytics here, as it is essentially complete, but the other five aren't far behind. We have plans to push improvements we identified during the AQS 2.0 implementation back to the scaffolding https://phabricator.wikimedia.org/T325526. This work is all being rearranged as part of the current reorg, but I'm hopeful that actually happens.

I don't have any objection to PHP tooling as well, I just wanted to mention some additional related work that exists, from which the PHP work could perhaps draw inspiration or lessons learned.

This is a pretty cool idea, if it works. I also love the Librarization work, wasn't aware of that. The ability to run MW / PHP in an image even if to just easily do some computation (in a sidecar?) instead of an HTTP service, sounds really useful.

See also:
https://www.mediawiki.org/wiki/Service_Scaffolding#Go_%28WIP%29

A big yes. It's a common practice for orgs/companies that follow microservices architecture to have their own service template. Such templating system will us for

  • easier maintenance, as templates follow some standard guidelines on how to test/build
  • provides a systematic way to handle logs/metrics/etc
  • has a set of tooling built around to be able to deploy/test code on stage/beta servers without overthinking about the infra
  • has provided set of approved/well tested solutions a.k.a "do not reinwent the wheel"

In the long run I wouldn't limit just to just one single template. For example, in one place we developed two templates:

  • simple -> based on PHP Slim framework to provide a basic functionality, like one/two API endpoints, event listener, etc
  • complex -> based on Symfony for more complex projects where

We supported both - Slim was used in most of the places due to complexity/performance/size/etc. Symfony template wasn't widely used, but couple critical solutions was based on symfony.

I don't think the template has to be based on MediaWiki or tightly coupled with it. I would push to have something small and easy to maintain first. It should provide a common/easy way to retrieve configs, perform logging, handle metrics, and CI instrumentation to nicely fit into our ecosystem. Also, it would be great if in the meantime we start slicing MediaWiki in set of libraries, to get components that are easy to reuse.

On top of it, a huge advantage would be to provide a composer project template - it will make it super simple to use, plus we will contribute even more to Open Source community.

There was an unofficial template based on Laravel in the past, although I'm not sure it was a proper template or just everyone copying the same application (iegreview or scholarships, don't remember which was first). @bd808 would probably know more.

@Tgr may be thinking of a couple of things:

  • https://gerrit.wikimedia.org/g/wikimedia/slimapp is a PHP Slim based application framework that was extracted from the combination of the Wikimania Scholarships and Wikimedia Grants Review applications that @Niharika and I wrote back in the day. I have used this framework as the basis of several Toolforge tools. The version of Slim that it wraps is very old these days, but that could probably be updated in a 2.x series if it otherwise seemed useful.
  • https://github.com/wikimedia/ToolforgeBundle is a Symfony bundle written primarily by @Samwilson and @MusikAnimal to simplify various tools that they have written in Toolforge.

Also, it would be great if in the meantime we start slicing MediaWiki in set of libraries, to get components that are easy to reuse.

That was the point of the https://www.mediawiki.org/wiki/Library_infrastructure_for_MediaWiki project that was only officially staffed in October-December 2014. The legacy of the project is support of Composer for library management in MediaWiki, the Librarization tag here in Phabricator, and various extracted libraries.

Shellbox is a related use case, although it's implemented as a monorepo, not a template.

There are cookiecutter templates for some other things (libraries, extensions).

https://github.com/wikimedia/ToolforgeBundle is a Symfony bundle written primarily by @Samwilson and @MusikAnimal to simplify various tools that they have written in Toolforge.

I'd be keen to help make ToolforgeBundle useful for more use cases, if that's worthwhile. :-) It is mentioned in the description:

wikimedia/ToolforgeBundle is an example of what I'm talking about, though I think we'd want something more specifically tailored to API use cases.

@kostajh what do you mean by tailored to API use cases? As it stands, the bundle doesn't do much specifically for creating APIs, but Symfony is pretty good for that, and we've used in the past in conjunction with things like NelmioApiDocBundle to create OpenAPI-compatible endpoints.

Perhaps due to lack of in-house expertise, we haven't been making many improvements/fixes to wikimedia/service-template-node for several years. To give one example, it has dependencies on deprecated projects, and in other cases is using deprecated dependencies released ~5 years ago. This makes it hard/impossible to run tools like osv-scanner on apps that build on top of wikimedia/service-template-node.

For what it's worth, most of the recent work has been to replace what service-template did with (what is being called) scaffolding (see https://gerrit.wikimedia.org/g/mediawiki/services/service-scaffold-node for the nascent node version, and https://gerrit.wikimedia.org/g/mediawiki/services/service-scaffold-golang for golang). The main differences being that the scaffolding repository would be limited to some very basic idiomatic examples, with standardized build files, that imported any common code from separately maintained libraries (see https://gerrit.wikimedia.org/g/mediawiki/services/servicelib-node & https://gerrit.wikimedia.org/g/mediawiki/services/servicelib-golang respectively). You'd make a copy of the scaffold to start a project, but you'd truncate the repo; It wouldn't be a fork that you continually merged changes in from, like was necessary for service-template. Think: all those scaffold scripts that frameworks use to init a new project, only without the script.

That sounds good. ToolforgeBundle has a thing like that for new projects, where a new tool can be created with composer create-project wikimedia/toolforge-skeleton ./my-cool-tool (it has a auto-deployment script, but not much else for the actual hosting; it's all optimized for Toolforge, although we're also using it on VPSs).

One more hypothesis for PHP IMO is that it requires significantly less long-term maintenance effort than Node when base OS versions (and as a consequence language versions) change, because the library ecosystem tends to be more stable. (Not sure how to quantify and test that, though; it's just a personal impression based on how abandoned Node services fared. Also, no idea if the same is true for Go.)

There is some interest in the community in writing tooling in Rust (see mwbot-rs), and that is somewhat compatible with PHP via ext-php-rs so high-performance services could in theory be written in a mix of PHP and Rust (which even has some level of support for machine learning). It would have to involve some not-very-mature components, though.

Krinkle subscribed.

This is relevant to an issue we discussed at the offsite last month around offering teams the option to migrate legacy Node.js services to e.g. the new MediaWiki REST API (extension) and/or a standalone PHP service (if indeed it qualifies to be its own service, under latest SRE guideance at https://www.mediawiki.org/wiki/Wikimedia_services_policy and T239856).

One more hypothesis for PHP IMO is that it requires significantly less long-term maintenance effort than Node when base OS versions (and as a consequence language versions) change, because the library ecosystem tends to be more stable. (Not sure how to quantify and test that, though; it's just a personal impression based on how abandoned Node services fared. Also, no idea if the same is true for Go.)

Is there any drawback to having a separate library, managed using whatever system is idiomatic to the language? Or conversely, is there an advantage to having projects be a fork of the template?

There is some interest in the community in writing tooling in Rust (see mwbot-rs), and that is somewhat compatible with PHP via ext-php-rs so high-performance services could in theory be written in a mix of PHP and Rust (which even has some level of support for machine learning). It would have to involve some not-very-mature components, though.

This is relevant to an issue we discussed at the offsite last month around offering teams the option to migrate legacy Node.js services to e.g. the new MediaWiki REST API (extension) and/or a standalone PHP service (if indeed it qualifies to be its own service, under latest SRE guideance at https://www.mediawiki.org/wiki/Wikimedia_services_policy and T239856).

It's worth noting that Tim designed the REST framework for MediaWiki to be independent of MediaWiki - this has slipped a bit, but it would not be very hard to extract it and make it functional outside MW as well.

Is there any drawback to having a separate library, managed using whatever system is idiomatic to the language? Or conversely, is there an advantage to having projects be a fork of the template?

Libraries are somewhat limited in what they can do (you probably won't start with a somewhat-set-up root folder, although e.g. Composer can probably do that via custom install hooks), but as you said they can be complemented by some sort of scaffolding tool. On the whole I'd expect libraries to be a much more manageable approach.

We had a meeting about this yesterday, with Bill, Thomas, Chris, Andrew, Daniel, Timo, Roan, Subbu

There seems to be a lot of support for the idea.

Below is a summary of our discussion, paraphrased based on my memory:

Bill: have been writing services in two wasy in the past: 1) using node.js, which requires a different way of thinking from PHP. 2) using GoLang, for AQS 2.0. Much more similar to PHP in terms of mental model. GoLang seemed approachable, wondering if there's a future for that.

Chris: Off the cuff, seems like obviously a good idea from an SRE perspective. Scaffolding is always better than templates. Should be opinionated, make services think about the platform in a way that matches the usual standards of our production environment -- metrics, log formatting, tracing. There should only be a few well-supported options of framework/language for services. No strong feelings about Node.js. PHP seems to make a lot of sense, but not hugely attached to it. Either way, good scaffolding is key.

Andrew: service-runner is oppinionated. Generally pro supporting PHP for standalong services. Re-using libraries we developed for MediaWiki wouold be useful. We re-inventing the wheel for each language we add. Perhaps it would also help with growing interest in modularizing MediaWiki (both in terms of services and re-usable libraries).

Chris: One thing we don't want to re-invent is a client for PoolCounter. We have it in PHP and Python and... need more?

Daniel: would be nice to hear the voice of engineers from a frontend centric team. People working on client side code tend to have more expertize in JavaScript and favor that tech fopr the server side as well.

Timo: PHP and Kubernetes could function as an RPC endpoint. Sharing code with MediaWiki is the key point. But why run a separate process, when you can just make it an extension? That's simpler and faster. Should remain the default.

Chris: could we have some extensiosn run as a service (without MW)? Timo: we have done things like that in the past, so why not...

Daniel: the answer is always "it depends", so what does it depend on? PHP is slow to start up, so not a good choice for things that need low latency responses. So conversely, it should be considered for long running things, like transcoding.

Timo: you wouldn't write memcached in PHP, but startup time is < 1ms these days

Roan: server-side Vue rendering as a use case. Sooner, server-side graph rendering as a usecase. We'll want to use JS for that because that's what the graph library is written in. But another escape hatch is shellbox. When you need a pure function as a service, one option we have is write something that can be invoked from the CLI and wrap it within Shellbox. That way, you can use libraries in other languages, and still use PHP for serving HTTP requests. We'll have to look into scaling shellbox (especially for the use case of SSR of Vue)

Timo: we should also think about how it scaled back down, for small third party installs or development setups. Using shellbox doesn't remove the dependency on node.js in that scenario, it adds a dependency on setting up shellbox.

Daniel: Can we come up with a checklist for deciding for or against a PHP standalone service

Timo: we have https://www.mediawiki.org/wiki/Wikimedia_services_policy which included such a checklist in an earlier draft, https://www.mediawiki.org/wiki/User:EEvans_(WMF)/Scratch/Standards_for_external_services

Daniel: why not use MW as the scaffolding for PHP services?

Timo: that could be the default because it's simple, but there are valid reasons not to. Like security and perhaps also performance. But we are making MW more flexible, which makes it more suitable for this use case. E.g. by introducing virtual DB hosts that allow us to move DB tables to a separate cluster.

Chris: Even if we do that, we should improve the scaffolding for MW (particularly JobRunners).

Bill: One use case is running code that needs libraries / PHP versions / etc that are incompatible with Mediawiki

Daniel: to explore the other direction - do we want to move away from node.js?

Chris: we probably need *some* node.js if we want to use certain JS libraries. But that doesn't mean it needs to be recommended.

Timo: wouldn't want to re-implement complex tech, like MathJax or Zotero. We need a machanism for exporting metrics for observability, which probably means we'd want to run these in a node based web server (Express).

Thomas: Coming from work that is largely removed from MW: the entire Event Platform is based on Node.js, and we probably wouldn't to move away from that. While it makes sense to prefer PHP for things close to MW, but having one part of WMF discuraging Node.js while another part is entirely relying on it would be confusing.

Subbu: PHP 2024 is very different from PHP in 2012. Parsoid would not have gotten off the ground in 2012 in PHP for a whole bunch of reasons. Node was the right choice at the time, but rewriting it in PHP later was also the right choice. Don't need to blanket discourage other languages/environments, but, if PHP is good enough for your problem, why not use it?

Daniel: Okay it sounds like everyone is onboard with a PHP scaffolding, but where do we draw the line at adding more environments? We'll have Node.js for a long time, we have Golang, we'll probably have Python services for a while (e.g. thumbor) ... where do we draw the line?

Bill: Kask already existing was a big push in favor of using Go for AQS v2, we couldn't have justified it otherwise. In any case, we'd have to actually have resourcing for scrating the Scaffolding.

Thomas: there seems little appetite for creating more services in Go and maintaining infrastructure for it.

Timo: Resourcing is the main reason we want to keep the number of languages low. Implementing the same logic in several languages can be expensive. It's very hard to beat the time-to-market of an MW extension. Mw is a powerful framework, and it has strong support in the org.

Timo: Speaking of Go, there is discussion of discontinuing Kasks and using Memcached directly for sessions. With respect to the event platform, we'd be hard pressed to write something like EventGateway or ChangeProp in PHP. You want a persistent service for that kind of thing.

We had a meeting about this yesterday, with Bill, Thomas, Chris, Andrew, Daniel, Timo, Roan, Subbu

There seems to be a lot of support for the idea.

Below is a summary of our discussion, paraphrased based on my memory:

<snip>

Thanks for the summary, Daniel! It largely matches my recollection.

I'll add that my takeaway from the meeting was that nobody seemed strongly opposed to the idea of using PHP for services. Much of the discussion, at least as I recall it, was people poking at the idea from different directions to make sure the idea was sensible and not redundant with any existing approach. My sense of the room was that there are other approaches that are necessary/superior for certain things, but that services written in PHP likely make sense for other situations. In other words, a PHP-based template/scaffolding/servicelib approach would never be the only tool in our toolbox (and I'm not aware of anyone even suggesting that it would be). But it would probably be something worth having.

If anyone feels that I'm misrepresenting the group sentiment, or their individual opinion, please correct me.