Page MenuHomePhabricator

Gerrit VM to test data migration
Closed, ResolvedPublic

Description

We need to test migration of data from Gerrit schema 2.15 to Gerrit schema 2.16 using "real" data. Since the data is private, I can't do this on a labs machine. This task request a Ganeti VM to be used for those tests. It will be reclaimed once the migration has been completed.

Site/Location: eqiad
Number of systems: 1
Service: Gerrit

Networking Requirements: Ability to easily copy data from gerrit1001, access to gerrit sql
Processor Requirements: 8 (migration is IO bound, but getting an understanding of timing on "production-like" hardware would be ideal)

Memory: 16G
Disks: 80G (32 GB worth of git data + overhead)
Other Requirements: None
Project Duration: 3 weeks (hopefully less)

Details

SubjectRepoBranchLines +/-
operations/puppetproduction+0 -7
operations/dnsmaster+0 -4
operations/dnsmaster+1 -4
operations/puppetproduction+0 -36
operations/puppetproduction+1 -1
operations/puppetproduction+0 -9
operations/puppetproduction+0 -2
operations/puppetproduction+1 -11
operations/puppetproduction+0 -1
operations/puppetproduction+56 -44
operations/puppetproduction+7 -5
operations/puppetproduction+1 -1
operations/puppetproduction+2 -1
operations/puppetproduction+6 -0
operations/puppetproduction+2 -1
operations/puppetproduction+1 -1
operations/dnsmaster+9 -9
operations/puppetproduction+1 -1
operations/puppetproduction+2 -3
operations/puppetproduction+7 -0
operations/puppetproduction+3 -0
operations/puppetproduction+5 -1
operations/puppetproduction+11 -5
operations/puppetproduction+4 -0
operations/puppetproduction+8 -1
operations/dnsmaster+2 -0
operations/dnsmaster+2 -5
operations/dnsmaster+8 -1
operations/puppetproduction+1 -0
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 562965 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] ferm_misc/db: allow connections from gerrit-test in ferm

https://gerrit.wikimedia.org/r/562965

Change 563284 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: make db_user configurable in Hiera

https://gerrit.wikimedia.org/r/563284

Change 563284 merged by Dzahn:
[operations/puppet@production] gerrit: make db_user configurable in Hiera

https://gerrit.wikimedia.org/r/563284

Change 563302 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: use 'gerritro' readonly db user on test server

https://gerrit.wikimedia.org/r/563302

Change 563302 merged by Dzahn:
[operations/puppet@production] gerrit: use 'gerritro' readonly db user on test server

https://gerrit.wikimedia.org/r/563302

Change 565392 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: rename gerrit-test to gerrit1002

https://gerrit.wikimedia.org/r/565392

Change 565392 merged by Dzahn:
[operations/puppet@production] install_server: rename gerrit-test to gerrit1002

https://gerrit.wikimedia.org/r/565392

Change 565395 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: replace gerrit-test with gerrit1002

https://gerrit.wikimedia.org/r/565395

Change 565395 merged by Dzahn:
[operations/puppet@production] site: replace gerrit-test with gerrit1002

https://gerrit.wikimedia.org/r/565395

Change 565399 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] add IPs for gerrit1002 in row C

https://gerrit.wikimedia.org/r/565399

Mentioned in SAL (#wikimedia-operations) [2020-01-16T22:38:41Z] <mutante> ganeti1003 - deleting VM gerrit-test (T239151)

cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: gerrit-test.wikimedia.org

  • gerrit-test.wikimedia.org (FAIL)
    • Downtimed host on Icinga
    • No management interface found (likely a VM)
    • Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
    • Failed to shutdown, manual intervention required: Cumin execution failed (exit_code=2)
    • Set Netbox status on VM not yet supported: manual intervention required
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Change 565399 merged by Dzahn:
[operations/dns@master] add IPs for gerrit1002 in row C

https://gerrit.wikimedia.org/r/565399

IP situation fixed!

server:

gerrit1002.wikimedia.org has address 208.80.154.75
gerrit1002.wikimedia.org has IPv6 address 2620:0:861:3:208:80:154:75

service:

gerrit-test.wikimedia.org has address 208.80.154.78
gerrit-test.wikimedia.org has IPv6 address 2620:0:861:3:208:80:154:78

recreating VM as gerrit1002 so that we can use gerrit-test as service name:

Creating new VM named gerrit1002.wikimedia.org in eqiad with row=C vcpu=1 memory=16 gigabytes disk=80 gigabytes link=public

Change 565708 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] install_server: update MAC address of gerrit1002

https://gerrit.wikimedia.org/r/565708

Change 565708 merged by Dzahn:
[operations/puppet@production] install_server: update MAC address of gerrit1002

https://gerrit.wikimedia.org/r/565708

Change 565715 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: set gerrit host name and server list for gerrit1002/gerrit-test

https://gerrit.wikimedia.org/r/565715

Change 565716 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] acme_chief/gerrit: remove gerrit-new, add gerrit1002

https://gerrit.wikimedia.org/r/565716

Change 565716 merged by Dzahn:
[operations/puppet@production] acme_chief/gerrit: remove gerrit-new, add gerrit1002

https://gerrit.wikimedia.org/r/565716

Change 565715 merged by Dzahn:
[operations/puppet@production] gerrit: set gerrit host name and server list for gerrit1002/gerrit-test

https://gerrit.wikimedia.org/r/565715

Change 562587 merged by Dzahn:
[operations/puppet@production] gerrit: assign host gerrit1002 role::gerrit

https://gerrit.wikimedia.org/r/562587

The VM is now usable. It has the role(gerrit) on it and no more puppet errors. It uses its own service name/IP:

https://gerrit-test.wikimedia.org

Shell access is automatically granted by the role to the same people who have it on the prod server.

Monitoring and backups should be disabled.

Gerrit is configured to only know about itself and not the other Gerrit servers.

There are 63G free on / including /srv

The mysql user has also been made configurable (along with backups / monitoring) and it is using:

104     hostname = m2-master.eqiad.wmnet
105     database = reviewdb
106     username = gerritro

Note the 'gerritro' read-only user.

 94     heapLimit = 5g
 95     slave = false

116     canonicalWebUrl = https://gerrit-test.wikimedia.org/r

218 [sshd]
219     listenAddress = gerrit-test.wikimedia.org:29418
220 
221     listenAddress = [2620:0:861:3:208:80:154:78]:29418
Dzahn claimed this task.

The gerrit acmechief TLS cert has been updated to contain "gerrit-test" in addition to gerrit and gerrit-replica. The "gerrit-new" name has been removed from it. This affected all Gerrit servers, including prod gerrit1001 which has the new cert now.

Change 562965 merged by Dzahn:
[operations/puppet@production] ferm_misc/db: allow connections from gerrit1002 in ferm

https://gerrit.wikimedia.org/r/562965

Change 566367 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: allow multiple rsync destination hosts in migration class

https://gerrit.wikimedia.org/r/566367

Change 566367 merged by Dzahn:
[operations/puppet@production] gerrit: allow multiple rsync destination hosts in migration class

https://gerrit.wikimedia.org/r/566367

Mentioned in SAL (#wikimedia-releng) [2020-01-21T21:48:46Z] <mutante> gerrit - rsyncing git data from gerrit1001 to gerrit1002 (T239151)

Mentioned in SAL (#wikimedia-releng) [2020-01-21T22:09:48Z] <mutante> gerrit - rsyncing 'git' and 'plugin' data dirs and /var/lib/gerrit2/review_site/ from gerrit1001 to gerrit1002 WITH --delete T239151

Mentioned in SAL (#wikimedia-operations) [2020-02-03T23:21:49Z] <mutante> gerrit1002 - deleting gerrit.log and gerrit.json files from January to free about 4GB of space (T239151 T243983)

Mentioned in SAL (#wikimedia-operations) [2020-02-03T23:26:32Z] <mutante> ganeti1003 - sudo gnt-instance modify --disk add:size=10G gerrit1002.wikimedia.org (T239151 T243983)

Mentioned in SAL (#wikimedia-operations) [2020-02-04T00:09:21Z] <mutante> gerrit1002 - replaced ens5 with ens6 in /etc/network/interfaces (IP and row had changed in the past, needed manual fix after reboot and now came back) ; mkfs.ext4 /dev/vdb on new additional 10GB disk. (T239151 T243983)

Dzahn added a subscriber: QChris.

@QChris fyi, this is the dedicated test machine for the gerrit upgrade, you can feel free to use it. I confirmed your shell user exists.

also see T243808#6025787 for some reason logs are not gzipped there but they are on the prod gerrit server, even though this uses the same role of course

Change 594293 had a related patch set uploaded (by Cwhite; owner: Cwhite):
[operations/puppet@production] profile,gerrit: add enable_monitoring flag for gerrit-test

https://gerrit.wikimedia.org/r/594293

Change 594293 merged by Dzahn:
[operations/puppet@production] profile,gerrit: add enable_monitoring flag for gerrit-test

https://gerrit.wikimedia.org/r/594293

Icinga monitoring for gerrit1002 has been removed. Thanks Cole.

Was blocked by T243800 which is now fixed.

Though gerrit service does not keep running due to other issues, like file permissions.

Reopening

Mentioned in SAL (#wikimedia-operations) [2020-05-28T10:02:39Z] <mutante> gerrit1002 (test server) - chown -R gerrit2:gerrit2 /var/lib/gerrit/review_site ; restarted gerrit service, now the service is not in restart loop anymore, gerrit-ssh is listening too, just not accepting publickey (T239151)

● gerrit.service - Gerrit code review tool
   Loaded: loaded (/lib/systemd/system/gerrit.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2020-05-28 09:51:08 UTC; 12min ago
[contint2001:~] $ sudo -u zuul ssh -p 29418 jenkins-bot@gerrit-test.wikimedia.org

  ****    Welcome to Gerrit Code Review    ****

  Hi jenkins-bot, you have successfully connected over SSH.

  Unfortunately, interactive shells are disabled.
  To clone a hosted Git repository, use:

  git clone ssh://jenkins-bot@gerrit-test.wikimedia.org:29418/REPOSITORY_NAME.git

Connection to gerrit-test.wikimedia.org closed.

cc: @hashar @QChris ^

Dzahn claimed this task.

As Chris points out this VM only has 1 vcpu. But requested were 8 and he needs 8. That was my mistake it seems. Reopening.

Mentioned in SAL (#wikimedia-operations) [2020-06-05T12:41:16Z] <mutante> rebooting gerrit1002 to add more vCPUs, after [ganeti1009:~] $ sudo gnt-instance modify -B vcpus=8 gerrit1002.wikimedia.org T239151

[gerrit1002:~] $ lscpu
...
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  1
Core(s) per socket:  1
Socket(s):           8
...
CPU MHz:             2499.998
BogoMIPS:            4999.99

T254644 has some comments about removing this again now that Gerrit prod is on 3.2. We can also use this ticket to decom it and feel free to open it.

Change 609875 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] zuul: remove gerrit-test connection and setup

https://gerrit.wikimedia.org/r/609875

Change 609878 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] acme_chief: remove gerrit-test

https://gerrit.wikimedia.org/r/609878

Change 609879 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site/DHCP/partman: decom gerrit1002

https://gerrit.wikimedia.org/r/609879

Change 609883 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: stop rsyncing to gerrit1002

https://gerrit.wikimedia.org/r/609883

Change 609884 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mariadb: remove ferm firewall hole for gerrit servers

https://gerrit.wikimedia.org/r/609884

Change 609886 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] remove gerrit-test.wikimedia.org

https://gerrit.wikimedia.org/r/609886

Change 609887 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] remove gerrit1002.wikimedia.org

https://gerrit.wikimedia.org/r/609887

Change 609883 merged by Dzahn:
[operations/puppet@production] gerrit: stop rsyncing to gerrit1002

https://gerrit.wikimedia.org/r/609883

Change 609875 merged by Dzahn:
[operations/puppet@production] zuul: remove gerrit-test connection and setup

https://gerrit.wikimedia.org/r/609875

Change 609878 merged by Dzahn:
[operations/puppet@production] acme_chief: remove gerrit-test

https://gerrit.wikimedia.org/r/609878

Change 610139 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] gerrit: remove absented host key file for gerrit-test

https://gerrit.wikimedia.org/r/610139

Change 610139 merged by Dzahn:
[operations/puppet@production] gerrit: remove absented host key file for gerrit-test

https://gerrit.wikimedia.org/r/610139

Change 610142 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: remove gerrit role from gerrit1002

https://gerrit.wikimedia.org/r/610142

Change 610142 merged by Dzahn:
[operations/puppet@production] site: remove gerrit role from gerrit1002

https://gerrit.wikimedia.org/r/610142

cookbooks.sre.hosts.decommission executed by dzahn@cumin1001 for hosts: gerrit1002.wikimedia.org

  • gerrit1002.wikimedia.org (PASS)
    • Downtimed host on Icinga
    • Found Ganeti VM
    • VM shutdown
    • Started forced sync of VMs in Ganeti cluster ganeti01.svc.eqiad.wmnet to Netbox
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB
    • VM removed

Change 609879 merged by Dzahn:
[operations/puppet@production] site/DHCP/partman: decom gerrit1002

https://gerrit.wikimedia.org/r/609879

Change 609886 merged by Dzahn:
[operations/dns@master] remove gerrit-test.wikimedia.org

https://gerrit.wikimedia.org/r/609886

Change 609887 merged by Dzahn:
[operations/dns@master] remove gerrit1002.wikimedia.org

https://gerrit.wikimedia.org/r/609887

The VM has been removed from all places in the repos, puppet and DNS.

The decom cookbook destroyed it and removed it from monitoring, puppetdb etc.

Change 609884 merged by Dzahn:
[operations/puppet@production] mariadb: remove ferm firewall hole for gerrit servers

https://gerrit.wikimedia.org/r/609884