Page MenuHomePhabricator

Request creation of deep-learning-services VPS project
Closed, ResolvedPublic

Description

Project Name: deep-learning-services
Purpose: Building more reliable infrastructure for running deep learning models.
Wikitech Username of requestor: Ebrahim

Hey. Currently I am using Toolforge to run this experimental work and things seem going well except the fact it seem I am hitting memory limits (currently I am using a ~80mb model but seem I am hitting limits on loading two 500mb different models available) and as I am thinking about scaling the work, a little faster machines with more able to run the models with more concurrency would be nice I believe. As I've described on Commons, currently the service uses pre-trained models but I am thinking expanding the work using wmf content and doing some, initially in a limited and not very time consuming form, trainings also. Also there are some other available open source models needing more of privilege and access to the machine so having VPS would be better for them.

I have currently have an admin access to another group and I had experience with the wmf VPSes and can create my VPSes there, I thought however, it is better to split this out to a separate group as works and changes on that group, say some policy change, won't affect machines of this project and vice versa.

Event Timeline

Ebraminio created this task.Aug 3 2017, 6:30 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 3 2017, 6:30 PM
Dzahn added a subscriber: Dzahn.Aug 3 2017, 6:31 PM
Ebraminio updated the task description. (Show Details)Aug 3 2017, 7:15 PM
Ebraminio added a comment.EditedAug 3 2017, 8:36 PM

Also not something that I would use immediately (as I am not going to do such training this soon, and even on that case, that computing intensive training) but it would be nice if you could consider provide some GPU capable machines between options you have for creating a new machine, for the future developments. I had some experience with them locally, both through pure CUDA and ML libraries and can use them already for my private uses. AWS, however, provides them as a service but I don't have access to it.

Edited: Tracked on T148843

Ebraminio updated the task description. (Show Details)Aug 6 2017, 2:40 PM

Any news around? This really delayed my work as some of the models I am going to use for the upcoming works (can be tracked on the given link) needs more system privilege...

chasemp triaged this task as Normal priority.Aug 25 2017, 1:06 PM
Andrew added a subscriber: Andrew.Aug 25 2017, 1:59 PM

@Ebraminio sorry about the delay in responding -- this task wasn't tagged with 'Project-requests' and so was largely invisible until just now. We'll review it during our weekly meeting on Tuesday.

bd808 added a subscriber: bd808.Aug 25 2017, 2:56 PM

Also not something that I would use immediately (as I am not going to do such training this soon, and even on that case, that computing intensive training) but it would be nice if you could consider provide some GPU capable machines between options you have for creating a new machine, for the future developments. I had some experience with them locally, both through pure CUDA and ML libraries and can use them already for my private uses. AWS, however, provides them as a service but I don't have access to it.
Edited: Tracked on T148843

Adding GPUs to the Cloud VPS offering has been discussed in the past year. These discussions happened mostly on private email threads so I don't have a ticket to point you to to see the discussion. We have no plans to purchase GPUs in the fiscal year 2017/2018 budget. The analytics team is currently experimenting with this as you have pointed out. If they have favorable results we may revisit the decision and investigate the maturity of upstream projects to support GPUs in OpenStack.

</threadjack>

Ebraminio added a comment.EditedSep 4 2017, 2:05 PM

Hey guys, is this discussed last week or will be on the next? Also have a look at the end of the linked discussion to see the already developed tool.

bd808 claimed this task.Sep 5 2017, 8:21 PM
Restricted Application added a project: User-bd808. · View Herald TranscriptSep 5 2017, 8:21 PM