Page MenuHomePhabricator

cloudvps: eqiad1: move neutron db to m5-master
Closed, ResolvedPublic

Description

The neutron database for the eqiad1 deployment is running on cloudcontrol1003.wikimedia.org (local mysql daemon). We would need this database to be moved to m5-master before eqiad1 goes into full production.

Event Timeline

aborrero triaged this task as Medium priority.Aug 20 2018, 9:06 AM
aborrero created this task.

I copied the raw DB dump from cloudcontrol1003 to m5-master.

Change 453987 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvps: eqiad1: move neutron db to m5-master

https://gerrit.wikimedia.org/r/453987

Change 453987 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvps: eqiad1: move neutron db to m5-master

https://gerrit.wikimedia.org/r/453987

Mentioned in SAL (#wikimedia-operations) [2018-08-20T11:01:52Z] <arturo> T202261 disabled puppet in cloudcontrol1003.wikimedia.org, cloudcontrol1004.wikimedia.org, clounet1003.eqiad.wmnet, cloudnet1004.eqiad.wmnet

Created DB grants on m5-master.eqiad.wmnet:

GRANT ALL PRIVILEGES ON neutron.* TO 'neutron'@'%' IDENTIFIED BY 'xxxxxxxxxx';

And tested connectivity from all 4 affected hosts:

aborrero@cloudcontrol1004:~ $ mysql -h m5-master.eqiad.wmnet neutron -u neutron -e 'SHOW TABLES' -p
Enter password: 
+-----------------------------------------+
| Tables_in_neutron                       |
+-----------------------------------------+
[...]
aborrero@cloudcontrol1003:~ 2s 1 $ mysql -h m5-master.eqiad.wmnet neutron -u neutron -e 'SHOW TABLES' -p
Enter password: 
+-----------------------------------------+
| Tables_in_neutron                       |
+-----------------------------------------+
[...]
aborrero@cloudnet1003:~ $ mysql -h m5-master.eqiad.wmnet neutron -u neutron -e 'SHOW TABLES' -p
Enter password: 
+-----------------------------------------+
| Tables_in_neutron                       |
+-----------------------------------------+
[...]
aborrero@cloudnet1004:~ $ mysql -h m5-master.eqiad.wmnet neutron -u neutron -e 'SHOW TABLES' -p
Enter password: 
+-----------------------------------------+
| Tables_in_neutron                       |
+-----------------------------------------+
[...]

Mentioned in SAL (#wikimedia-operations) [2018-08-20T11:25:12Z] <arturo> T202261 icinga downtime 1h for cloudcontrol1003.wikimedia.org, cloudcontrol1004.wikimedia.org, clounet1003.eqiad.wmnet, cloudnet1004.eqiad.wmnet previous to patch merge

The neutron-server <--> neutron-XXX-agent connection is sometimes unreliable when it comes to the initial synchronizations.
I had to restart agents and server a couple of times until they can see each other.

What I did last was to restart the server, without restarting the agents, and the the state was good shortly after that:

root@cloudcontrol1003:~# neutron agent-list
+--------------------------------------+--------------------+---------------+-------------------+-------+----------------+---------------------------+
| id                                   | agent_type         | host          | availability_zone | alive | admin_state_up | binary                    |
+--------------------------------------+--------------------+---------------+-------------------+-------+----------------+---------------------------+
| 468aef2a-8eb6-4382-abba-bc284efd9fa5 | DHCP agent         | cloudnet1004  | nova              | :-)   | True           | neutron-dhcp-agent        |
| 601bef99-b53c-4e6a-b384-65d1feebedff | Metadata agent     | cloudnet1003  |                   | :-)   | True           | neutron-metadata-agent    |
| 8af5d8a1-2e29-40e6-baf0-3cd79a7ac77b | L3 agent           | cloudnet1003  | nova              | :-)   | True           | neutron-l3-agent          |
| 970df1d1-505d-47a4-8d35-1b13c0dfe098 | L3 agent           | cloudnet1004  | nova              | :-)   | True           | neutron-l3-agent          |
| 9f8833de-11a4-4395-8da5-f57fe8326659 | Linux bridge agent | cloudnet1003  |                   | :-)   | True           | neutron-linuxbridge-agent |
| ad3461d7-b79e-4279-921d-5a476e296767 | Linux bridge agent | cloudnet1004  |                   | :-)   | True           | neutron-linuxbridge-agent |
| b2f9da63-2f16-4aa5-9400-ae708a733f91 | Linux bridge agent | cloudvirt1021 |                   | :-)   | True           | neutron-linuxbridge-agent |
| d475e07d-52b3-476e-9a4f-e63b21e1075e | Metadata agent     | cloudnet1004  |                   | :-)   | True           | neutron-metadata-agent    |
| e382a233-e6a0-422e-9d2e-5651082783fc | Linux bridge agent | cloudvirt1022 |                   | :-)   | True           | neutron-linuxbridge-agent |
| ff2a8228-3748-4588-927b-4b6563da9ca0 | DHCP agent         | cloudnet1003  | nova              | :-)   | True           | neutron-dhcp-agent        |
+--------------------------------------+--------------------+---------------+-------------------+-------+----------------+---------------------------+

Mentioned in SAL (#wikimedia-operations) [2018-08-20T12:01:31Z] <arturo> T202261 extend icinga downtime 1D for cloudcontrol1003.wikimedia.org, cloudcontrol1004.wikimedia.org, clounet1003.eqiad.wmnet, cloudnet1004.eqiad.wmnet neutron not properly syncing with agents

aborrero claimed this task.
aborrero added a subscriber: Bstorm.

This is done, but triggered T188589 again. Thanks @Bstorm for his help in limiting DB connections ;-)