Page MenuHomePhabricator

Upgrade wikilabels databases to buster/bullseye
Closed, ResolvedPublic

Description

There's a pair of Postgres instances (clouddb-wikilabels-01,02) in the infrastructure clouddb-services project that need to be upgraded to Buster or Bullseye.

The database contents seem fairly small:

postgres=# SELECT datname, pg_size_pretty(pg_database_size(datname)) FROM pg_database;
      datname      | pg_size_pretty
-------------------+----------------
 postgres          | 7079 kB
 u_wikilabels_test | 58 MB
 template1         | 6969 kB
 template0         | 6969 kB
 u_wikilabels      | 163 MB
 s52524_rwtest     | 7103 kB
(6 rows)

Not sure why this is living in clouddb-services? https://wikitech.wikimedia.org/wiki/Wikilabels suggests those were previously running on bare metal in the prod realm, maybe those were just simply pushed to the clouddb-services project as a first step. Anyways these instances seem to be only serving a single project so it's worth considering if we can push the maintenance of this rather small service to the people who actually use it.

Event Timeline

taavi triaged this task as Medium priority.May 2 2022, 6:44 PM
taavi created this task.

Thanks a lot for the ping! I think that this is an old project that we (as ML team) may not keep supporting, but I'll let @calbon and @kevinbazira to comment since they have more context. In case we can cleanup/deprecate everything :)

Hello - any updates here? Thanks!

Sorry for the late response. @elukey regarding cleaningup/deprecating, the last I heard was wikilabels was to be replaced by our own labeling tool called pilot-flag proposed by @calbon. I'll leave this for him to clarify.

@taavi we are going to ask to the community if it is ok to drop these, would it be ok to wait a few days more?

These power Wikilabels, which is how training data is collected to create new and upgrades RevScoring/ORES models. It isn't frequently used, maybe once or twice a year, but until we have a) an alternative tool or b) an alternative to RevScoring/ORES models we need to keep Wikilabels available to the community.

TL;DR we can't drop these postgresql instances. We need to move them or upgrade them.

Change 849095 had a related patch set uploaded (by Klausman; author: Klausman):

[operations/puppet@production] wikilabels: move Postgres DB to its own (non-wmcs) role

https://gerrit.wikimedia.org/r/849095

Change 849095 merged by Klausman:

[operations/puppet@production] wikilabels: move Postgres DB to its own (non-wmcs) role

https://gerrit.wikimedia.org/r/849095

Change 850191 had a related patch set uploaded (by Klausman; author: Klausman):

[operations/puppet@production] wikilabels: fix wrong path for Postgres tuning.conf

https://gerrit.wikimedia.org/r/850191

Change 850191 merged by Klausman:

[operations/puppet@production] wikilabels: fix wrong path for Postgres tuning.conf

https://gerrit.wikimedia.org/r/850191

Change 850418 had a related patch set uploaded (by Klausman; author: Klausman):

[operations/puppet@production] wikilabels: maybe get the tuning.conf source part right

https://gerrit.wikimedia.org/r/850418

Change 850418 merged by Klausman:

[operations/puppet@production] wikilabels: maybe get the tuning.conf source part right

https://gerrit.wikimedia.org/r/850418

Change 850419 had a related patch set uploaded (by Klausman; author: Klausman):

[operations/puppet@production] wikilabels: actually install Postgres

https://gerrit.wikimedia.org/r/850419

Change 850419 abandoned by Klausman:

[operations/puppet@production] wikilabels: actually install Postgres

Reason:

Problem was not with a missing file but wrong/missing locales

https://gerrit.wikimedia.org/r/850419

Mentioned in SAL (#wikimedia-cloud) [2022-11-02T09:37:34Z] <taavi> shut down the wikilabels servers T307389

These are shut down now and can be deleted next week.

Change 852135 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] Drop toolschecker checks for the wikilabels database

https://gerrit.wikimedia.org/r/852135

Change 852135 merged by FNegri:

[operations/puppet@production] Drop toolschecker checks for the wikilabels database

https://gerrit.wikimedia.org/r/852135

Created VM and Puppet stuff as detailed above, and migrated the data, then switched the uwsgi applications on the main instance and staging to use said VM. Updated docs accordingly, including this new section:

Database dump & restore / Migration

The uwsgi app on wikilabels-03 uses a Postgres database as a backing store. This used to be a clouddb instance, but as of November 2022, is a separate VM, wikilabels-database-02.

Database credentials

The uwsgi app keeps its database configuration in two files, /srv/wikilabels/config/default-db-config.yaml and /srv/wikilabels/config/config/98-database.yaml.

The first file contains host information and user credentials, though the latter are unused. The actual username, password and database (inside of
Postgres) to use are in the second file, 98-database.yaml.

Dumping data

To dump the data in an easily restored format, use the pg_dump tool. You can run this on either wikilabels-03 or wikilabels-database-02:

pg_dump -U u_wikilabels -h wikilabels-database-03 u_wikilabels -f pg_dump-$(date -Is).sql

Note that the expected database name is u_wikilabels, just like the user name.

The above command will prompt you for the password of the u_wikilabels user, and then dump the database content as a series of SQL commands to stdout, and we redirect that to a timestamped file. The total amount of data is about 100MB.

Restoring data

To restore the saved data, copy the file to a convenient host (the database host itself is usually easiest). If necessary, create the user and database on the new Postgres instance (as root):

$ sudo -u postgres createuser u_wikilabels
$ sudo -u postgres psql
psql (13.8 (Debian 13.8-0+deb11u1))
Type "help" for help.

postgres=# \password u_wikilabels
Enter new password for user "u_wikilabels": 
Enter it again: 
postgres=# exit
$ sudo -u postgres createdb -O u_wikilabels u_wikilabels

This creates the u_wikilabels user, sets their password, and then creates a new database called u_wikilabels owned by the just-created user.

You can then restore the saved data by piping the dump file into an appropriate psql connection command:

psql -h localhost -W -d u_wikilabels  -U u_wikilabels  < pg_dump-[timestamp].sql

This will display some messages on stdout about what SQL commands are run (SET, CREATE TABLE etc).

Once this is done, the uwsgi application can be pointed at the new DB by editing the host setting /srv/wikilabels/config/default-db-config.yaml and restarting the application.

klausman moved this task from Unsorted to Complete Q3 2022/23 on the Machine-Learning-Team board.

Change 852779 had a related patch set uploaded (by Klausman; author: Klausman):

[operations/puppet@production] wikilabels: Cleanup old DB proxy information

https://gerrit.wikimedia.org/r/852779

Change 852779 merged by Klausman:

[operations/puppet@production] wikilabels: Cleanup old DB proxy information

https://gerrit.wikimedia.org/r/852779

Mentioned in SAL (#wikimedia-cloud) [2022-11-08T08:38:46Z] <taavi> delete clouddb-wikilabels-01,02 T307389

All done. Thank you!

Change 854478 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] Drop old wikilabels database roles

https://gerrit.wikimedia.org/r/854478

Change 854479 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] openstack: drop wikilabels global dns entries

https://gerrit.wikimedia.org/r/854479

Change 854478 merged by FNegri:

[operations/puppet@production] Drop old wikilabels database roles

https://gerrit.wikimedia.org/r/854478

Change 854479 merged by FNegri:

[operations/puppet@production] openstack: drop wikilabels global dns entries

https://gerrit.wikimedia.org/r/854479

Mentioned in SAL (#wikimedia-cloud) [2023-01-04T14:12:04Z] <dhinus> deleted CNAME wikilabels.db.svc.eqiad.wmflabs (T307389)