Page MenuHomePhabricator

Request increased quota for pm20-* Toolforge tool
Closed, ResolvedPublic

Description

Tool Name: pm20-search, pm20-report, possibly others
Quota increase requested: New Postgres database
Reason: Background database for the creation of Wikidata items or links

The Wikidata project https://www.wikidata.org/wiki/Wikidata:WikiProject_20th_Century_Press_Archives and, in German Wikipedia, https://de.wikipedia.org/wiki/Wikipedia:Projekt_Pressearchiv, were created to make the digitized 20th Century Press Archives better usable for the Wikimedia authors and the general public.

ZBW, the owner of the archives, has no capacities for subject indexing of the archives' material. It has donated all available metadata to Wikidata and supported its integration into Wikidata. The projects mentioned above aim to continue and extend this process. It would be very helpful to provide shared access to the source metadata, which is available as Postgres dump (https://zenodo.org/records/10588897). The database uses sequences and Postgres functions, so it cannot be straightforward converted to mysql or sqlite. The dump size ist less than 25 MB (compressed).

Please let me know when any further information would be helpful. Cheers, Joachim

Related Objects

Event Timeline

Hi @Jneubert, can you please elaborate what do you want the postgresql database for?

Is it just to process the dump?
Or do you want a standing instance to have the database running all the time?

(the solution to both things might be different)

Hi @dcaro , sorry for the late respose - I was some days off and missed the notification.

The request is about a standing instance, to be shared among at least two persons. It would have two key uses, firstly, to mint identifiers for new PM20 folders (wdt:P4293), and secondly, to add local information (in particular, holdings information which is not well suited to Wikidata). ZBW, the collections owner, has discontinued the maintenance of the database, but is willing to accept metadata to extend access to the collection.

The addition of data would be ongoing work, so there is no termination date.

Hope this helps, otherwise do hestitate to ask for further clarification.

Thanks, good to know.

Given that, there's two possible ways to go forward:

  • Creating your own database in a VM on CloudVPS:
    • You have full control (for good and bad, ex. upgrades/maintenance)
    • Would be pretty stable
  • Using Trove database on CloudVPS:
    • The DB is half-managed (you get a database with "just one click")
    • Postgresql is not very well supported (no automated backups or replication, OOMs under load, ...)

Note that currently both require the creation of a CloudVPS project, the difference is creating your own VM with postgres, or letting Trove do it for you.

We can create the project and let you try both if you want, I'll need a name for the project though (ex. pm20database).

That sounds great! Project name "pm20database" would be fine.

I'd try a Trove instance first, and hope that backups are possible via pgadmin. Load will be low, so I don't suppose OOMs would occur. But of course, I have zero experience in the setting, so it is good to know there is the VM option.

Andrew claimed this task.
Andrew subscribed.

@Jneubert, I've created a project with trove quotas for you to try. If Trove doesn't suit you re-open this task and we can give you quotas for a self-managed VM.

Thanks, @Andrew ! I was able to log into Horizon, but I see no "Launch Instance" button:

Screenshot_20240325_081855.png (630×1 px, 65 KB)

Apparently, I also do not have access to the project:

Screenshot_20240325_081657.png (465×1 px, 40 KB)

It is the first time that I deal with the Openstack environmet - please forgive if I miss something obvious.

Ok, figured it out: Had to switch from "bastion" to "pm20database" project ...

Ok, figured it out: Had to switch from "bastion" to "pm20database" project ...

That fits -- you only have read-only access to 'bastion' but should be able to do whatever in pm20database. Be warned that most horizon features don't work with postgres (it's largely tested/designed to support mysql). You'll be able to create the database but then you'll need to inject a root user and manage it via psql after that.