Page MenuHomePhabricator

Request creation of humaniki-2 VPS project
Closed, DeclinedPublic

Description

Project Name: humaniki-2

Developer account usernames of requestors: @Danya

Purpose: This will be the spiritual successor to https://humaniki.wmcloud.org/, which is no longer updated.

Brief description:
The website is currently hosted on Toolforge (workboard), using an SQLite database and the "webservice" framework. Toolforge has proven to be too limiting for the project’s ambitions: we need to write a Wikidata dump ingester in a language faster than the current Python version (Java, Rust, or C), but it is impossible to run and/or compile these languages on Tooloforge. This ingester also needs to be run as a cron job, which proved too difficult to do properly on Toolforge.
The only remaining option is using Cloud VPS where we will need to setup a web server with an associated database instance. From what we’ve read of the ToolsDB policy, it forbids persistent or idle connections, so we would need to install a local MariaDB (GPLv2) instance or similar. Within a few month/years there might be a need for more than the default 20 GB of storage as the database is expected to grow by a few GBs per month.
The instance will also require shared NFS directories access to be able to read Wikidata dumps in /public/dumps/public/wikidatawiki/entities/.

How soon you are hoping this can be fulfilled: This month, as we need to have a basic version of the website up and running for Wikimania 2026 where it will be officially unveiled to the public for the first time.

Event Timeline

using an SQLite database

Running MediaWiki with ToolsDB as a backend is likely to be more performant. The https://gitlab.wikimedia.org/toolforge-repos/mwdemo shows one possible method of setting up MediaWiki with a ToolsDB backend.

(Java, Rust, or C), but it is impossible to run and/or compile these languages on Tooloforge.

Citation needed. https://wikitech.wikimedia.org/wiki/Help:Toolforge/Building_container_images supports both Java and Rust explicitly. With a bit of abuse of the Python buildpack I think it would also be reasonably possible to compile C code during a container build. https://gitlab.wikimedia.org/toolforge-repos/bd808-buildpack-perl-bastion/-/blob/main/hatch_build.py shows how to run arbitrary Python code as part of the Python buildpack's work. That arbitrary Python could be "run this make command."

This ingester also needs to be run as a cron job, which proved too difficult to do properly on Toolforge.

Citation needed. https://wikitech.wikimedia.org/wiki/Help:Toolforge/Running_jobs

I'll answer your remarks in order:

  1. I don't want to run a MediaWiki instance alongside my tool. Would it be ok to use ToolsDB for a webservice with a persistent connection? From what I've read in the policy, it feels like it wouldn't (see my original message), but maybe I have missed something?
  2. My issue is I need to run Java or similar inside my already existing Python container image, is it possible? I know it is possible to run node/js if a package.json file is present inside an image for another language (which is what I'm already doing), but I didn't see a similar way to run other languages on Wikitech.
  3. The cron job needs to run inside the Python container image, which I have been unable to do so far. (To be honest, I never thought about trying to run the toolforge jobs command from within a container image, I'll try it this evening.)

I'll answer your remarks in order:

  1. I don't want to run a MediaWiki instance alongside my tool.

Sorry I misunderstood that part.

Would it be ok to use ToolsDB for a webservice with a persistent connection?

Can you explain how a persistent connection is necessary for your application's business logic? This feels like an XY problem where there may be a different path to achieve your goals than you have personally seen thus far. If your tool truly is actively reading and writing to the database continuously then it may not be trivially possible to work around. There would however in that case be a third option for database storage.

  1. My issue is I need to run Java or similar inside my already existing Python container image, is it possible? I know it is possible to run node/js if a package.json file is present inside an image for another language (which is what I'm already doing), but I didn't see a similar way to run other languages on Wikitech.

We do not have any complete tutorials on more complex runtime requirements like this, but I do think with a bit of experimentation it should be possible. The tools to use are the Toolforge Build Service and the deb-packages buildpack.

Your Python needs should be taken care of automatically if you have a <code>requirements.txt</code> file in the root of your project repository. That file is the trigger for the Python buildpack to run. Then you can add a project.toml file describing additional packages to install using <code>apt</code>. It sounds like you would need to install OpenJDK at least.

To use the newly installed OpenJDK tooling to compile java classes you would use a technique like https://gitlab.wikimedia.org/toolforge-repos/bd808-buildpack-perl-bastion/-/blob/main/hatch_build.py to run the necessary Java build tools.

  1. The cron job needs to run inside the Python container image, which I have been unable to do so far. (To be honest, I never thought about trying to run the toolforge jobs command from within a container image, I'll try it this evening.)

The toolforge jobs system would run an existing program inside a container of your choice, often a custom container built via Toolforge Build Service. The Procfile in a build service image can describe multiple entry points to make running the container as a one-off, scheduled, or continuous job easier. See wikibugs for an example where a webservice and four other jobs are run from the same container using the jobs service. (The example jobs are all continuous, but that is just a configuration detail.)

A different path would be to talk with the folks who run the wikidumpparse Cloud VPS project which is the current home of the https://humaniki.wmcloud.org project and see if your project can become part of theirs. The humaniki-2 name you have chosen makes it seem like you would like your project to be associated with theirs in the minds of users.

For the persistent db connection, it stems from my own habits. That said, I have just discovered that there is a way with the framework I use (Flask) to have non-persistent database connections, so this issue may no longer be relevent.
I already have a Procfile and requirements.txt that are automatically called when building and starting the server with toolforge build start <repo URL> and webservice start, but it seems I need to learn what more can be done with the Procfile and project.toml ^^
I’ll take a look at Trove, it seems I’ll need it relatively soon.

I must say this is the first time I have had to manage a web server, so I still have some learning to do.
Thank you very much for your patience and your detailled answers :)

Thank you very much for your patience and your detailled answers :)

Hello @Danya do you still need help with this or can we conclude that toolforge is enough is enough for your use-case and thus close this request?
@bd808 thanks for assisting.

Hi @Raymond_Ndibe, I think I’ll stick with Toolforge for now as it seems the blocking points have been at least partially addressed. The request can be closed.