PAWS singleuser pods (the actual user execution environment) have RAM fixed at 3 GB at this time. That's pretty good for a lot of things, but a simple model in TensorFlow will get killed pretty easily. We do have some nodes that have greater RAM at this time, which presents the possibility of allowing a larger request for pods that need it. This is supported in Toolforge , but it is not possible in PAWS right now.
The questions are:
- How do we expose such a feature to users?
- What limits can we place on the feature other than "consuming an entire host won't actually work"?
An alternative is to provide a pipeline that allows a notebook to be run on Toolforge.