- Project Name: wikisource
- Type of quota increase requested: CPU and RAM for IA Upload
- Amount of quota increase: Enough for one tiny test instance (e.g. 1CPU, 2GB RAM) and one large prod instance (e.g. 4CPU, 8GB RAM) both with standard 20GB disk
- Reason: IA Upload does lots of image converting and filesystem operations and is not running well on Toolforge. We had similar issues with Wikisource OCR and saw massive improvements when it moved to VPS (we think mainly because of filesystem access).
Current quota for wikisource is:
11 vCPU and 24G RAM
It seems the proposal would be to bump to 16 vCPU and 34G RAM .
+1, would be interesting though to get some info on what's so slow on toolforge (for future improvements), @Samwilson if you have done any investigation and have any notes can you share also? thanks!
+1 from me. We know that our current NFS iops are not ideal for large file manipulations.
Thanks, that quota looks great. I'll make the new instances now.
I'm afraid I don't have any hard evidence to present as to why it's slow, as it's always been hard to reliably replicate.
My feeling was that it was due to being on NFS, because we have at times seen slowness even with things like git status, and on VPS there's been nothing similar. The IA Upload tool also has a file-based queue system, which is really not the best way to do it but it's much more work to redo that than it is to move the tool to a VPS so we thought that perhaps this would be an easier/quicker first step. I know it introduces extra maintenance, but we're happy to take that on, especially as it's pretty much exactly the same as other tools that we maintain (such as WS Export, and Wikimedia OCR).
In general, the slowness I've seen on Toolforge with this tool has felt like there's some sort of caching going on somewhere: e.g. the job queue goes through a bunch of directories and does stuff, and on the first run I've noticed it's slow but then after that it's been fine. I've no idea really about where to start looking for that (it's definitely not the application that's caching it). Every time I've tried to look deeper it's started working fine, and I've given up.
Sorry for not having better info!
Mentioned in SAL (#wikimedia-cloud) [2021-06-11T14:38:34Z] <balloons> set quota to 16 vcpu, 24G ram T284527