Page MenuHomePhabricator

Request creation of "wmf-research-tools" VPS project
Closed, ResolvedPublic

Description

Project Name: wmf-research-tools

Wikitech Usernames of requestors: bmansurov

Purpose: Researchers use the project to validate research ideas, test prototypes, etc.

Brief description: As researchers, we need access to a space where we can install and play with different forms of technology. While the learning needs are usually identified or inspired by specific projects we're working on, this space should not be tied to a specific project.

How soon you are hoping this can be fulfilled: as soon as possible

Event Timeline

@diego could you update "Brief description" with more info about how you're going to use labs instances?

Can we scope the project to the "Article expansion recommendations" project and its needs rather than the highly generic "research" theme? Team based and broad topic based Cloud VPS projects have turned out to be an anti-pattern that we would like to avoid expanding on. The long term problem becomes determining which VMs are in use and by whom years later. This in turn slows down efforts to reclaim unused resources and to upgrade VMs to new base images. Umbrella projects also make it more difficult to manage access control for the VMs which in turn tends to make umbrrella project owners reluctant to collaborate with the wider technical volunteer community.

If there are many high churn projects that the Foundation Research team expects to work on via Cloud VPS that may be a good reason for the exception to this unwritten rule, but I would like to see some compensating control introduced to ensure that the provenance of each VM is tracked in some way within the broad project.

@diego could you update "Brief description" with more info about how you're going to use labs instances?

Basically, we are planning to run python scripts that requires high use of RAM (~16GB) and - if possible - implement REST API, also in python. Access to HDFS and Spark is will be also appreciated. If this last requirement is not possible, we will need enough HDD (/SSD) to store our data, meaning at least 100GB.
In the future, we are also considering to install some NoSql database such as ElasticSearch or MongoDB

@bmansurov @diego The task description needs some changes. This is not a request related to a specific project/effort. In our off-site we identified a need for having access to a dedicated place where we can easily play with new and old forms of technologies without having to wait for someone else to make that happen for us which can create delays in research.

Given that the need is general and not tied to a specific project, it would be great if we explore the option to create an exception for this Research space but work with bd808 to introduce a control mechanism.

leila triaged this task as High priority.Feb 7 2018, 4:08 PM
leila updated the task description. (Show Details)

Access to HDFS and Spark is will be also appreciated

This can only happen inside the Analytics network, not from a Cloud VPS project.

we will need enough HDD (/SSD) to store our data, meaning at least 100GB

That is a lot of disk quota for an instance.

In our off-site we identified a need for having access to a dedicated place where we can easily play with new and old forms of technologies without having to wait for someone else to make that happen for us which can create delays in research.

This sounds reasonable, but if you really need to get at Analytics data it might be easier in the long term to look into getting a small server for your use in that network. Asks for big disk and ram are always going to take some time in Cloud VPS to sort out.

@bd808, I suggested they use Cloud VPS for this, as a generic Research project, because they really need to experiment with techs like we do in the Analytics project. I'd be totally fine with giving them access to the Analytics project for this purpose too. They can't experiment with new techs inside the prod/analytics network.

We would like to suggest the project name wmf-research-tools rather than the very generic research word which may be confusing to folks now or in the future.

The other meta concern I have is that everyone be warned that Cloud VPS is not a safe place to store or process sensitive data. Anything that is copied out of the Analytics network into this project needs to be safe for anyone in the larger Wikimedia community to see.

Other than those tweaks/concerns, I give my +1 for this project to get started.

wmf-research-tools sounds good to us. @bd808 please let us know if you need anything else from us.

bd808 renamed this task from Request creation of "research" VPS project to Request creation of "wmf-research-tools" VPS project.Feb 14 2018, 5:44 PM
bd808 assigned this task to Andrew.
bd808 moved this task from Inbox to Clinic Duty on the cloud-services-team (Kanban) board.
bd808 updated the task description. (Show Details)

I've created the project, with bmansurov as the projectadmin. They can add additional members or admins as necessary. You should have enough quota to get started -- please open up an additional quota ticket if you run out.

Thank you everyone for helping with this task.