Page MenuHomePhabricator

Raise quota on wikiqlever so that an instance with 256 GB RAM and 3 x 4 TB SSD can be launched
Closed, ResolvedPublic

Description

Project Name: wikiqlever
Type of quota increase requested: ram/disk
Amount of quota increase: enough to create one large instance to run https://github.com/qlever-dev/qlever-control/blob/main/src/qlever/Qleverfiles/Qleverfile.wikidata
Reason: Explore a migration path for the backend of scholia https://query-legacy-full.wikidata.org/, a legacy endpoint, unsplit, scheduled to run until December 2025

Maybe we can create the project with the default quotas, and you can ask for whatever quota bump later once the data is clear to you?

This follows up on T377655 , where the machine was set up with default quota. This was enough to test QLever with a small dataset but not enough to test it with Wikidata.

This is meant to replace the blazegraph-legacy setup, which is to be switched off by January 7, as per T411410 .

Having an environment on which to test QLever would be aligned with the broader efforts to evaluate Blazegraph alternatives, as per T206560 . There is also a dedicated ticket for exploring hardware requirements for such SPARQL backend candidates: T306726 .

Event Timeline

Physikerwelt renamed this task from Raise quota on wikiqlever machine to 256 GB RAM and 3 x 4 TB SSD to Raise quota on wikiqlever so that an instance with 256 GB RAM and 3 x 4 TB SSD can be launched.Dec 18 2025, 3:22 PM

Most likely, we need some additional support to create an instance with this flavor.

Daniel_Mietchen updated the task description. (Show Details)
Daniel_Mietchen added subscribers: BTracy-WMF, gmodena.
JJMC89 changed the task status from Open to Stalled.Dec 18 2025, 4:11 PM
JJMC89 raised the priority of this task from High to Needs Triage.
JJMC89 added a project: cloud-services-team.
JJMC89 edited subscribers, added: JJMC89; removed: aborrero.

See Cloud-VPS (Quota-requests) for how your request needs to be formulated. Priority is determined by the WMCS team, not the requestor.

@JJMC89 I filled out the form. As described in the documentation

# Qleverfile for Wikidata, use with the QLever CLI (`pip install qlever`)
#
# qlever get-data  # ~7 hours, ~110 GB (compressed), ~20 billion triples
# qlever index     # ~5 hours, ~20 GB RAM, ~500 GB index size on disk
# qlever start     # a few seconds, adjust MEMORY_FOR_QUERIES as needed
#
# Adding a text index takes an additional ~2 hours and ~50 GB of disk space
#
# Measured on an AMD Ryzen 9 5950X with 128 GB RAM, and NVMe SSD (18.10.2024)

When it is used for https://scholia.toolforge.org there will be quite a load, so it would be good to have a bit more than the absolute minimum. Also to be able to upgrade, we would need two instances for swapping. (This can be step two).

JJMC89 changed the task status from Stalled to Open.Dec 18 2025, 9:25 PM
fgiunchedi triaged this task as Medium priority.Dec 19 2025, 1:25 PM

Being a 16x ram quota bump it'll require sign off from WMCS folks other than clinic duty (i.e. me ATM); from my reading of related tasks the ram is going to be on a single VM (?) I believe we'll need a bigger flavor as well

Having a non-split query service available to interested users is going to be useful during the period from the end of the legacy service to the time that the new WDQS is available. This alternative service probably doesn't need the same uptime characteristics as even the WDQS.

@fgiunchedi @taavi We can probably start testing meaningfully with a 128GB RAM setup. We do envision significant load, though, which is why this ticket requests 256 GB.

@fgiunchedi @taavi We can probably start testing meaningfully with a 128GB RAM setup. We do envision significant load, though, which is why this ticket requests 256 GB.

In https://phabricator.wikimedia.org/T377655#11473315 we documented the effort to do the setup. Even if we don't need to download the data, we need approximately 1 TB of disk. We could give it a try with a 32 GB, 64 GB, or 96 GB flavor. If that works, we can maybe scale horizontally. To be a good fit for the WMCloud infrastructure, I think it's desired not to have too much memory in one VM.

Hey, I have no technical understanding of what it means to request computing resources, but I do WikiCite / scholarly Wikidata curation, so I care that the WikiQlever team has what they need before the 7 January Migration. The team needs a speedy decision on computing resources so that we can migrate Scholia in its new location https://qlever.scholia.wiki/ to remain operational before the split finalizes. All of this is a miraculous unexpected save due to Qlever falling out of the sky as a free and open-source software backend from a nonprofit org, and that is working better than Wikidata's Blazegraph backend. The WikiCite / Scholia team is using Qlever as an alternative to keep the popular service going while also complying with WMF plans to get the WikiCite project mostly out of the Wikidata main graph.

We are having a hackathon now as documented at https://www.wikidata.org/wiki/Wikidata:Scholia/Events/2025_12 and in anticipation of the 7 January 2026 scheduled end of Wikidata graph. For our volunteer team, this time of year is a holiday for production and development time, but for paid staff, this is offline vacation time.

I am writing to ask for a quick decision about whether this resource can be allocated, because of the typical decision makers for this may be gone until after the scheduled 7 January graph split. I do not have any understanding of the cost of fulfilling this request in terms of labor or machine use, but if this request is the kind of thing that could be granted, then please can it be granted now.

We have had great steady progress as a volunteer team for the past year and a half since first making this request in T377655 , when we got the answer that we should develop the project more until we hit limits. The limits are now hit, and for lots of reasons, we have people contributing to development right now as the deadline approaches.

Is this request something to which you can give a yes/no, preferably yes, before staff holiday vacations?

Physikerwelt raised the priority of this task from Medium to High.Jan 5 2026, 6:41 PM

Raising priority given the proximity to the deadline on Jan 7th.

taavi lowered the priority of this task from High to Medium.Jan 5 2026, 6:46 PM

Resetting priority. Requests of this size can only be granted following a cloud-services-team meeting, and the next one of those is scheduled on Thursday, 8 January.

Amount of quota increase: enough to create one large instance to run https://github.com/qlever-dev/qlever-control/blob/main/src/qlever/Qleverfiles/Qleverfile.wikidata

This is not an amount I can plug into our tooling. Please provide actual numbers instead of requiring us to guess something.

That's a problem.

Resetting priority. Requests of this size can only be granted following a cloud-services-team meeting, and the next one of those is scheduled on Thursday, 8 January.

That doesn't seem to fit the graph split plan.

Amount of quota increase: enough to create one large instance to run https://github.com/qlever-dev/qlever-control/blob/main/src/qlever/Qleverfiles/Qleverfile.wikidata

This is not an amount I can plug into our tooling. Please provide actual numbers instead of requiring us to guess something.

I don't know, but we could test it to find out.

Can you assign 1 TB of disk and 32 GB of RAM? If that doesn't work, we file a new request.

Can you assign 1 TB of disk and 32 GB of RAM?

+1 for increasing the quota to these values.

Can you assign 1 TB of disk and 32 GB of RAM?

+1 for increasing the quota to these values.

This can be good for an initial test. While I don't know the specs of the machine running WDQS, I don't think it's realistic that this amount is sufficient to replace it. Eventually, 256 GB RAM and 3 x 4 TB SSD seems to be a realistic guess.

@Physikerwelt Do you need also the VCPU limit to be increased? I see that they are all used right now.

While I don't know the specs of the machine running WDQS

Depends which WDQS machines/clusters you're looking at, there are various WDQS cluster. You can find most of the info in Grafana, either via the cluster-overview (example in eqiad for one of the clusters) or via the host-overview (example with a random wdqs host)

The flavor g4.cores8.ram32.disk20 would allow testing without hitting the VCPU limit. I guess

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-07T17:07:52Z] <volans@cloudcumin1001> START - Cookbook wmcs.openstack.quota_increase by 944 gigabytes, 16384 ram (T413097)

Mentioned in SAL (#wikimedia-cloud-feed) [2026-01-07T17:08:00Z] <volans@cloudcumin1001> END (PASS) - Cookbook wmcs.openstack.quota_increase (exit_code=0) by 944 gigabytes, 16384 ram (T413097)

Limits increased to 1TB of disk and 32G of RAM.

Thank you. That worked.
Do you happen to know how I can get https://wikitech.wikimedia.org/wiki/Help:Shared_storage#/public/dumps mounted to /public/dumps in the math cluster instances have

clouddumps1002.wikimedia.org:/                                  196T  104T   83T  56% /mnt/nfs/dumps-clouddumps1002.wikimedia.org
clouddumps1001.wikimedia.org:/                                  196T  106T   81T  57% /mnt/nfs/dumps-clouddumps1001.wikimedia.org
scratch.svc.cloudinfra-nfs.eqiad1.wikimedia.cloud:/srv/scratch  3.0T  1.4T  1.5T  49% /mnt/nfs/secondary-scratch

the qlever1 instance does not seem to have any nfs mounts. For now, we can also download the data via http. However, all we need to do is roughly this

GET_DATA_URL      = https://dumps.wikimedia.org/wikidatawiki/entities
GET_DATA_CMD      = curl -LRC - -O ${GET_DATA_URL}/latest-all.ttl.bz2 -O ${GET_DATA_URL}/latest-lexemes.ttl.bz2 2>&1 | tee wikidata.download-log.txt && curl -sL ${GET_DATA_URL}/dcatap.rdf | docker run -i --rm -v $$(pwd):/data stain/jena riot --syntax=RDF/XML --output=NT /dev/stdin > dcatap.nt

so it might be better to get the data from nfs.

Just a bit of context for the long-run: a VM with 256G of RAM is technically possible but at that size live migration/rescheduling is likely to work poorly. That means that when we perform hardware maintenance I would probably need to shut it down for a few minutes rather than moving it instantly to new hardware. If you need 100% VM uptime then I advise a design that uses a cluster of small VMs rather than one giant.

Limits increased to 1TB of disk and 32G of RAM.

Great! :D

The page says "Cloud VPS users can request to have the share available." Unfortunately, it does not include a link for where to file such a request.

The page says "Cloud VPS users can request to have the share available." Unfortunately, it does not include a link for where to file such a request.

@So9q I updated the wiki page linking to an example task, a task like this would work: T398477: Requesting access to NFS mount /public/dumps for dumpstorrents Cloud VPS project

taavi assigned this task to Volans.