Project Name: deep-learning-services
Purpose: Building more reliable infrastructure for running deep learning models.
Wikitech Username of requestor: Ebrahim
Hey. Currently I am using Toolforge to run this experimental work and things seem going well except the fact it seem I am hitting memory limits (currently I am using a ~80mb model but seem I am hitting limits on loading two 500mb different models available) and as I am thinking about scaling the work, a little faster machines with more able to run the models with more concurrency would be nice I believe. As I've described on Commons, currently the service uses pre-trained models but I am thinking expanding the work using wmf content and doing some, initially in a limited and not very time consuming form, trainings also. Also there are some other available open source models needing more of privilege and access to the machine so having VPS would be better for them.
I have currently have an admin access to another group and I had experience with the wmf VPSes and can create my VPSes there, I thought however, it is better to split this out to a separate group as works and changes on that group, say some policy change, won't affect machines of this project and vice versa.