Page MenuHomePhabricator

Remove als.wik(ibooks|iquote|tionary), mo.wik(ipedia|tionary) views from replicas
Closed, ResolvedPublic

Description

See parent task. The Board of Trustees approves the deletion of these projects.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 3 2017, 11:14 AM
Marostegui edited projects, added Wikimedia-Rdbms; removed DBA.Dec 3 2017, 11:18 AM
Marostegui added a subscriber: Marostegui.

I don't think we delete the databases directly in our databases, we should instead go thru MediaWiki to make that happen.

Marostegui edited projects, added DBA; removed Wikimedia-Rdbms.Dec 3 2017, 11:21 AM

That should be reviewed them to see if there could be some sort of inconsistency created, specially on es servers

@Marostegui Yes, of course. Can you please do me a favor and run show tables; on centralauth? Because I think the docs I linked are outdated. Namely, I can't see a localnames table. This are my views:

MariaDB [centralauth_p]> show tables;
+---------------------------+
| Tables_in_centralauth_p   |
+---------------------------+
| global_group_permissions  |
| global_group_restrictions |
| global_user_groups        |
| globalblocks              |
| globaluser                |
| localuser                 |
| renameuser_status         |
| wikiset                   |
+---------------------------+
8 rows in set (0.00 sec)

Thank you.

Yeah, the view doesn't contain all the existing tables:

| global_group_permissions  |
| global_group_restrictions |
| global_user_groups        |
| globalblocks              |
| globalnames               |
| globaluser                |
| localnames                |
| localuser                 |
| oathauth_users            |
| renameuser_queue          |
| renameuser_status         |
| securepoll_lists          |
| spoofuser                 |
| users_to_rename           |
| wikiset                   |

Weird. Can you please @Marostegui run SELECT COUNT(*) FROM localnames WHERE ln_wiki='mowiki'; and SELECT COUNT(*) FROM localnames WHERE ln_wiki='mowiktionary'; so we know exactly how much users should be deleted?

Also, is it normal not to display to the replica those tables? If they contain private data, shouldn't the specific fields be NULLed instead? Please advice.

Thank you!

Weird. Can you please @Marostegui run SELECT COUNT(*) FROM localnames WHERE ln_wiki='mowiki'; and SELECT COUNT(*) FROM localnames WHERE ln_wiki='mowiktionary'; so we know exactly how much users should be deleted?

root@db1094[centralauth]> SELECT COUNT(*) FROM localnames WHERE ln_wiki='mowiki';
+----------+
| COUNT(*) |
+----------+
|     2156 |
+----------+
1 row in set (0.00 sec)

root@db1094[centralauth]> SELECT COUNT(*) FROM localnames WHERE ln_wiki='mowiktionary';
+----------+
| COUNT(*) |
+----------+
|      401 |
+----------+
1 row in set (0.00 sec)

root@db1094[centralauth]>

Also, is it normal not to display to the replica those tables? If they contain private data, shouldn't the specific fields be NULLed instead? Please advice.

As per T103011#3536648 (don't know if you have access to that task), localnames table can be replicated but it needs to have a views-redacted version.

Thanks. Funny thing is that localuser count seems to be the same as localnames. See T169450#3806717.

And nope, I don't have access to that task. Only for VIP apparently ;-)

Regards.

Thanks. Funny thing is that localuser count seems to be the same as localnames. See T169450#3806717.

Yeah, don't know the internals for that correspondence :-(

And nope, I don't have access to that task. Only for VIP apparently ;-)

It basically requires an NDA because of the content of the task :-)

Strainu added a subscriber: Strainu.Dec 3 2017, 1:08 PM
Ebe123 added a subscriber: Ebe123.Dec 4 2017, 12:04 AM

No, you don't. That wiki page explicitely states

Actually deleting the data from all of the database servers would be quite a bit more complicated, and is usually not worth doing.

What needs to be removed is the _p view and the database from the replicas, but not the actual production db hosts. We might need a bit more help from DBA in order to clean up some tables (which seems to be relatively well-documented at that wiki page), but I think we can agree to "decline" the deletion of the actual database from the prod hosts. (I'll leave it to someone else to decide whether to change this task accordingly or decline it.)

[ /me suggest that that page be improved ]

No, you don't. That wiki page explicitely states

Actually deleting the data from all of the database servers would be quite a bit more complicated, and is usually not worth doing.

What needs to be removed is the _p view and the database from the replicas, but not the actual production db hosts. We might need a bit more help from DBA in order to clean up some tables (which seems to be relatively well-documented at that wiki page), but I think we can agree to "decline" the deletion of the actual database from the prod hosts. (I'll leave it to someone else to decide whether to change this task accordingly or decline it.)

Removing the views from the replicas can be easily done by the cloud-services-team.
I am all fordward NOT deleting the databases directly as I have stated already, it is a pain, and it is prone to cause more errors than benefits.
So, +1 to decline this task and create a task when everything is done to delete the labs replicas views.

Or just set it as stalled and narrow the scope? Note that I don't know the exact thing that dba has to do, but I guess you will need to be involved so I'm not sure if declining this to reopen a new task later is of any benefit, unless I'm wrong.

I am fine with that. We can help, but we are not responsible for cleaning up the views.

MarcoAurelio changed the task status from Open to Stalled.Dec 9 2017, 1:50 PM
MarcoAurelio triaged this task as Lowest priority.

I would suggest we rename this task to remove mowiki and mowiktionary views from wiki replicas.
Obviously, we should also move those wikis to the delete dblist and all that stuff.

@Marostegui Apparently you do: https://wikitech.wikimedia.org/wiki/Delete_a_wiki
Removing the views from the replicas can be easily done by the cloud-services-team.
I am all fordward NOT deleting the databases directly as I have stated already, it is a pain, and it is prone to cause more errors than benefits.

The wiki page has the following bullet points for "clean up wiki replica data":

  • Delete the _p view and backing database on all replica servers
  • Run maintain-meta_p --all-databases --purgeon all replica database servers

I do not quite know what the "backing database" here means. So, just to be clear, in my imagination/understanding of wiki replica layout, the presence of the database on replica hosts (replication) and the possibility of users to actually connect to that database/use it/get data from it (views) is completely seperate. So if we remove all the views for a database nobody (except cloud-services-team, DBA, other Operations) will be able to use that database any more, right? The term "Delete the backing database" actually sounds like dropping the database from the replicas (not from prod hosts though).

Basically, I understood that we're going to leave the dbs on prod hosts alone. Yet, it's not clear to me whether

  • databases will be dropped from replica hosts or
  • they will stay there as well and just all views are to be removed.

Possibly @Marostegui can help with that, I'd like to clarify the documentation on this.

EddieGP renamed this task from Kill mowiki and mowiktionary databases; scheduled for deletion to Remove mowiki and mowiktionary databases from replicas.Dec 9 2017, 5:34 PM

The wiki page has the following bullet points for "clean up wiki replica data":

  • Delete the _p view and backing database on all replica servers
  • Run maintain-meta_p --all-databases --purgeon all replica database servers

I believe that is is, but cloud-services-team should know better.

I do not quite know what the "backing database" here means. So, just to be clear, in my imagination/understanding of wiki replica layout, the presence of the database on replica hosts (replication) and the possibility of users to actually connect to that database/use it/get data from it (views) is completely seperate. So if we remove all the views for a database nobody (except cloud-services-team, DBA, other Operations) will be able to use that database any more, right?

That is correct.
Once the view is removed, no one will be able to access the database itself.

The term "Delete the backing database" actually sounds like dropping the database from the replicas (not from prod hosts though).
Basically, I understood that we're going to leave the dbs on prod hosts alone. Yet, it's not clear to me whether

  • databases will be dropped from replica hosts or
  • they will stay there as well and just all views are to be removed.

Possibly @Marostegui can help with that, I'd like to clarify the documentation on this.

I would not drop the databases from the replicas, so they are consistent with production. I would just remove the views.
Basically, whatever we do in production with them, I would do the same in the replicas.
If we delete the databases itself (which we have decided not to), they would need to be deleted from the replicas.
If we keep then in production, we should keep them in the replicas (but removing the views).

Those databases should be moved to the https://github.com/wikimedia/operations-mediawiki-config/blob/master/dblists/deleted.dblist file too (I assume).

If you don't mind, I will rename the task to clarify that the views is what we have to actually drop, once we are ready for it.

Marostegui renamed this task from Remove mowiki and mowiktionary databases from replicas to Remove mowiki and mowiktionary views from replicas.Dec 9 2017, 6:52 PM
Marostegui edited projects, added cloud-services-team; removed DBA.
Marostegui added subscribers: chasemp, bd808, Andrew and 2 others.

That is correct. Once the view is removed, no one will be able to access the database itself. [...] I would not drop the databases from the replicas, so they are consistent with production. I would just remove the views.

Thanks, this was helpful to make me understand it and clarify that vague term of "backing database". I've updated the wiki page accordingly: https://wikitech.wikimedia.org/w/index.php?title=Delete_a_wiki&diff=1777830&oldid=1777798

Those databases should be moved to the https://github.com/wikimedia/operations-mediawiki-config/blob/master/dblists/deleted.dblist file too (I assume).

You're assuming right. That is already documented and @MarcoAurelio thankfully already proposed https://gerrit.wikimedia.org/r/#/c/394846/ to do exactly this.

If you don't mind, I will rename the task to clarify that the views is what we have to actually drop, once we are ready for it.

Thanks for that, too.

bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.Dec 10 2017, 4:59 AM
EddieGP changed the task status from Stalled to Open.Dec 12 2017, 4:47 PM

The wikis are now redirected. To do here, according to the docs:

  • Delete the _p view as well as all other views on all replica servers. The database itself is not dropped.
  • Run maintain-meta_p --all-databases --purge on all replica database servers

If you find anything more needs to be done, please extend the documentation :)

EddieGP renamed this task from Remove mowiki and mowiktionary views from replicas to Remove als.wik(ibooks|iquote|tionary), mo.wik(ipedia|tionary) views from replicas.Dec 13 2017, 11:44 AM
bd808 claimed this task.Jan 4 2018, 9:55 PM

Change 402137 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] wmcs: Add database drop support to maintain-views

https://gerrit.wikimedia.org/r/402137

Change 402137 merged by Madhuvishy:
[operations/puppet@production] wmcs: Add database drop support to maintain-views

https://gerrit.wikimedia.org/r/402137

bd808 closed this task as Resolved.Jan 5 2018, 10:27 PM
bd808 added a subscriber: jcrespo.

labsdb1009:

$ sudo maintain-views --debug --drop --databases mowiki
2018-01-05 21:52:31,905 DEBUG Removing 0 dbs as sensitive
Drop mowiki_p? [y/N] y
2018-01-05 21:52:33,651 DEBUG SQL: DROP DATABASE `mowiki_p`;
labsdb1009:~
bd808$ sudo maintain-views --debug --drop --databases mowiki
2018-01-05 21:52:38,509 DEBUG Removing 0 dbs as sensitive
2018-01-05 21:52:38,510 WARNING DB mowiki_p does not exist
labsdb1009:~
bd808$ sudo maintain-views --debug --drop --databases mowiki mowiktionary alswik
i alswikibooks alswikiquote alswiktionary
2018-01-05 21:52:58,873 DEBUG Removing 0 dbs as sensitive
Drop alswiktionary_p? [y/N] y
2018-01-05 21:53:00,965 DEBUG SQL: DROP DATABASE `alswiktionary_p`;
Drop mowiktionary_p? [y/N] y
2018-01-05 21:53:02,958 DEBUG SQL: DROP DATABASE `mowiktionary_p`;
Drop alswiki_p? [y/N] y
2018-01-05 21:53:04,544 DEBUG SQL: DROP DATABASE `alswiki_p`;
2018-01-05 21:53:04,566 WARNING DB mowiki_p does not exist
Drop alswikiquote_p? [y/N] y
2018-01-05 21:53:06,566 DEBUG SQL: DROP DATABASE `alswikiquote_p`;
Drop alswikibooks_p? [y/N] y
2018-01-05 21:53:07,666 DEBUG SQL: DROP DATABASE `alswikibooks_p`;

labsdb1011:

$ sudo maintain-views --debug --drop --databases mowiki mowiktionary alswiki alswikibooks alswikiquote alswiktionary
2018-01-05 22:08:27,497 DEBUG Removing 0 dbs as sensitive
Drop alswiki_p? [y/N] y
2018-01-05 22:08:29,220 DEBUG SQL: DROP DATABASE `alswiki_p`;
Drop alswiktionary_p? [y/N] y
2018-01-05 22:08:30,340 DEBUG SQL: DROP DATABASE `alswiktionary_p`;
Drop mowiktionary_p? [y/N] y
2018-01-05 22:08:31,700 DEBUG SQL: DROP DATABASE `mowiktionary_p`;
Drop alswikiquote_p? [y/N] y
2018-01-05 22:08:32,597 DEBUG SQL: DROP DATABASE `alswikiquote_p`;
Drop mowiki_p? [y/N] y
2018-01-05 22:08:33,381 DEBUG SQL: DROP DATABASE `mowiki_p`;
Drop alswikibooks_p? [y/N] y
2018-01-05 22:08:34,194 DEBUG SQL: DROP DATABASE `alswikibooks_p`;

labsdb1010:

$ sudo maintain-views --debug --drop --databases mowiki mowiktionary alswiki alswikibooks alswikiquote alswiktionary
2018-01-05 21:53:25,423 DEBUG Removing 0 dbs as sensitive
Drop mowiktionary_p? [y/N] y
2018-01-05 21:53:27,133 DEBUG SQL: DROP DATABASE `mowiktionary_p`;

labsdb1010 hung here. @madhuvishy looked into the running processes and found that it is waiting on a metadata lock:

| 22704 | maintainviews | localhost | NULL | Query | 390 | Waiting for table metadata lock | DROP DATABASE `mowiktionary_p` | 0.000 |

There is another proc also waiting on the metadata lock:

| 1073 | root | localhost | mysql | Query | 18039 | Waiting for table metadata lock | ALTER TABLE event MODIFY name char(64) CHARACTER SET utf8 NOT NULL default '' | 0.000 |

This seems likely to be a long running alter that @jcrespo is running on the host based on ps output.

I stopped the hung process and then tried it again. This time it reported the mowiktionary_p was gone. Seems like the metadata lock was for cleaning something up after the drop? I went ahead and ran the drop and ^C'd when hung for each of mowiki_p, mowiktionary_p, alswiki_p, alswikibooks_p, alswikiquote_p, and alswiktionary_p.

I then verified using sql that the databases were reported as unknown on both the analytics and web Wiki Replica clusters.

bd808 added a comment.Jan 5 2018, 10:39 PM

Oops. I forgot to purge the meta_p data as well. That has now been done on labsdb1009, labsdb1010, and labsdb1011.

I also updated https://wikitech.wikimedia.org/wiki/Delete_a_wiki#To_delete_a_wiki with the new maintain-views --clean command.

EddieGP reopened this task as Open.Jan 5 2018, 11:57 PM

Drop alswiki_p? [y/N] y
2018-01-05 21:53:04,544 DEBUG SQL: DROP DATABASE alswiki_p;

alswiki is still alive, the other three als* redirect there. Please restore that view.

bd808 closed this task as Resolved.Jan 6 2018, 12:48 AM
$ sudo maintain-views --debug --databases alswiki
2018-01-06 00:33:39,962 DEBUG Removing 0 dbs as sensitive
2018-01-06 00:33:39,964 DEBUG SQL: CREATE DATABASE `alswiki_p`;
Traceback (most recent call last):
  File "/usr/local/sbin/maintain-views", line 480, in <module>
    ops.execute(fullviews, customviews)
  File "/usr/local/sbin/maintain-views", line 268, in execute
    "CREATE DATABASE `{}`;".format(self.db_p)
  File "/usr/local/sbin/maintain-views", line 53, in write_execute
    self.cursor.execute(query)
  File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 166, in execute
    result = self._query(query)
  File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 322, in _query
    conn.query(q)
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 852, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1053, in _read_query_result
    result.read()
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1336, in read
    first_packet = self.connection._read_packet()
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1010, in _read_packet
    packet.check_error()
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 393, in check_error
    err.raise_mysql_exception(self._data)
  File "/usr/lib/python3/dist-packages/pymysql/err.py", line 107, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.OperationalError: (1044, "Access denied for user 'maintainviews'@'localhost' to database 'alswiki_p'")

@madhuvishy fixed this with her root super powers and then maintain-views ran as expected on labsdb1009, labsdb1010, and labsdb1011.

$ sql alswiki
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 1644927
Server version: 10.1.30-MariaDB MariaDB Server

Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

(u3518@alswiki.labsdb) [alswiki_p]>

Change 402458 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] wikireplica: Remove dns for deleted wikis als.wiki* and mo.wik*

https://gerrit.wikimedia.org/r/402458

Change 402458 merged by Madhuvishy:
[operations/puppet@production] wikireplica: Remove dns for deleted wikis als.wiki* and mo.wik*

https://gerrit.wikimedia.org/r/402458

madhuvishy added a comment.EditedJan 6 2018, 1:20 AM

@bd808 jfyi I also cleaned up the dns entries for these replicas, see patch in above comments. Fixed https://wikitech.wikimedia.org/wiki/Delete_a_wiki#To_delete_a_wiki too.

Please don't run actions involving DDLs while mysql_upgrade is in process. How to know if mysql_upgrade is running?- it will be run immediately after a reboot in most cases for one or a few hours:

$ date; mysql -e "SHOW STATUS like 'uptime'"
Sat Jan  6 14:54:44 UTC 2018
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Uptime        | 79428 |
+---------------+-------+

@jcrespo Thanks for pointing that out! Will add that to our docs.