Page MenuHomePhabricator

Request increased quota for etytree labs project
Closed, ResolvedPublic

Description

Project Name: etytree
Type of quota increase requested: ram
Reason:

I am going to install a triplestore (Virtuoso) to query RDF data equivalent to the server accessible at http://kaiko.getalp.org/sparql which is managed by a group from the University of Grenoble and uses a 32 Gb RAM. As we don't have need administration administration privileges on that resource and as we need admin privileges in order to develop the project we are trying to install the server in the Wikimedia Labs. The data consists of 80 million triples (statistics calculated with the following query at the Virtuoso server I have used so far http://kaiko.getalp.org/sparql

Select ?p count(?p) as ?count where {

Graph <http://kaiko.getalp.org/dbnary/eng> { ?s ?p ?o }

} group by ?p

order by  desc(?count)

)

Considering that we need to apply ontology inference rules to the database, the total number of triples could increase by a factor of 2 or 3, which makes for a total of possibly 240 million triples.
A standard setup for such data requires at least a RAM of 32Gb (see for example https://joernhees.de/blog/2010/10/31/setting-up-a-local-dbpedia-mirror-with-virtuoso/ where they have 257 million triples and use a RAM of 32Gb), also given that we need to be able to perform complex queries to produce a complex graphical output.

The above calculations are estimates. If after the setup has been completed and queries have been defined we will see that less resources are used we will scale them down.

Note:
We got the impression that getting this amount of memory in the Wikimedia Labs can be fairly difficult. If it won't be possible we will likely install the server locally at the RECAS https://www.recas-bari.it/index.php/en/ were we will be assigned a virtual machine with a 32 Gb RAM. We would prefer to have the resource installed on Wikimedia labs for two main reasons:

  1. the project uses Wiktionary dumps
  2. sharing resources with volunteers and maintainability would be much easier if this resource was hosted at the WMF.

Event Timeline

I've added a new instance type ('flavor') to your project called 'bigram' with 36Gb of RAM. I've also increased the project quota to permit creation of one bigram instance in addition to the instance you already have there. Please delete the old instance once you've migrated your configuration.

By the way -- is this project intended to run indefinitely, or does it have an expected lifespan? (If the latter, I'll make a note to clean up after the use period ends.)

-Andrew

That's great, thank you! I just deleted the old instance.

I hope the project will run indefinitely. As mentioned before, if I manage to reduce the amount of memory needed I'll write a note.