Page MenuHomePhabricator

Profile single-version and multi-version image builds to identify bottlenecks
Closed, ResolvedPublic

Description

The multiversion MediaWiki image build takes 12-17 minutes currently which is quite slow which will be an issue for certain user workflows such as MediaWiki backport deployment.

It's unclear exactly where the bottleneck is at the moment but it does seem disk bound at certain points such as when single-version images (up to 2) are pulled, mounted, and copied into the final image. More profiling is needed. @dancy has some measurements from vmstat already.

Releases hosts are currently spec'd with 4G of memory and 2 vCPUs. Disk specs are unclear. Much of the memory (~ 1.5 G) is currently taken up by Jenkins.

Event Timeline

@akosiaris I don't see hardware or VM specifications for releases1002 on wikitech. Do you know what our options are if we need to request more RAM or faster disks?

@akosiaris I don't see hardware or VM specifications for releases1002 on wikitech. Do you know what our options are if we need to request more RAM or faster disks?

You can request more RAM for sure, but up to a limit (IIRC 8GB) as some operations like live migrations become problematic after a certain threshold). However, per https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&refresh=5m&var-server=releases1002&var-datasource=thanos&var-cluster=misc&from=now-7d&to=now the VM isn't really using that RAM, it's 50% allocated to the pagecache, so you won't really see a big difference there.

Disks wise, the VM is already on SSDs, not much we can do there.

What I would suggest is to increase the amount of vCPUs. Per https://grafana.wikimedia.org/d/000000377/host-overview?viewPanel=3&orgId=1&var-server=releases1002&var-datasource=thanos&var-cluster=misc&from=1617642691437&to=1617646499791 and P15150 that @dancy pasted (thanks for that) it's pretty clear the CPU is the limit. Wanna file a task for increasing it to, say 6, vCPUs?

Some additional measurements:

Max single threaded unbuffered file write speed on /dev/vda1. Writing a 6GB file full of zeros.

dancy@releases1002:~$ dd if=/dev/zero of=bigfile bs=1024k count=6000 oflag=direct
6000+0 records in
6000+0 records out
6291456000 bytes (6.3 GB, 5.9 GiB) copied, 81.0061 s, 77.7 MB/s

Same, but w/ normal buffering and a sync at the end.

dancy@releases1002:~$ time (dd if=/dev/zero of=bigfile bs=1024k count=6000 ; echo syncing; time sync)
6000+0 records in
6000+0 records out
6291456000 bytes (6.3 GB, 5.9 GiB) copied, 55.0782 s, 114 MB/s
syncing

real 0m4.536s
user 0m0.009s
sys 0m0.000s

real 0m59.618s
user 0m0.053s
sys 0m8.854s

thcipriani assigned this task to dancy.
thcipriani subscribed.

Data collection done, experiments to continue