Marostegui (Manuel Aróstegui)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Sep 1 2016, 6:48 AM (63 w, 1 d)
Availability
Available
IRC Nick
marostegui
LDAP User
Marostegui
MediaWiki User
MArostegui (WMF)

Recent Activity

Yesterday

Marostegui updated the task description for T180714: s5 primary master db1063 crashed.
Fri, Nov 17, 5:40 PM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui updated the task description for T180714: s5 primary master db1063 crashed.
Fri, Nov 17, 5:22 PM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui updated the task description for T54921: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking).
Fri, Nov 17, 4:49 PM · Epic, DBA, Tracking
Marostegui shifted T180788: Rack and setup db1111 and db1112 from the Restricted Space space to the S1 Public space.
Fri, Nov 17, 2:22 PM · Patch-For-Review, ops-eqiad, DBA, Operations
Marostegui moved T180788: Rack and setup db1111 and db1112 from Triage to In progress on the DBA board.
Fri, Nov 17, 2:20 PM · Patch-For-Review, ops-eqiad, DBA, Operations
Marostegui created T180788: Rack and setup db1111 and db1112.
Fri, Nov 17, 2:20 PM · Patch-For-Review, ops-eqiad, DBA, Operations
Marostegui added a comment to T180700: Rack and setup db1109 and db1110.

Thanks a lot @Cmjohnson for getting them up and running so fast! Really appreciated!

[14:15:14] marostegui@db1109:~$ uptime
 14:15:15 up 12 min,  1 user,  load average: 0.04, 0.16, 0.09
Fri, Nov 17, 2:16 PM · Patch-For-Review, ops-eqiad, Operations, DBA
Marostegui removed a project from T128152: Migrate all old DB rows from windows-1252 to UTF-8 on dawiki: DBA.
Fri, Nov 17, 2:08 PM · Patch-For-Review, Wikimedia-Site-requests, Technical-Debt
Marostegui removed a project from T128153: Migrate all old DB rows from windows-1252 to UTF-8 on svwiki: DBA.
Fri, Nov 17, 2:08 PM · Patch-For-Review, Wikimedia-Site-requests, Technical-Debt
Marostegui removed a project from T128154: Migrate all old DB rows from windows-1252 to UTF-8 on nlwiki: DBA.
Fri, Nov 17, 2:08 PM · Patch-For-Review, Wikimedia-Site-requests, Technical-Debt
Marostegui removed a project from T128155: Migrate all old DB rows from windows-1252 to UTF-8 on dawiktionary: DBA.
Fri, Nov 17, 2:07 PM · Patch-For-Review, Wikimedia-Site-requests, Technical-Debt
Marostegui removed a project from T128156: Migrate all old DB rows from windows-1252 to UTF-8 on svwiktionary: DBA.
Fri, Nov 17, 2:07 PM · Patch-For-Review, Wikimedia-Site-requests, Technical-Debt
Marostegui removed a project from T128151: Migrate all old DB rows from windows-1252 to UTF-8 on enwiki: DBA.
Fri, Nov 17, 2:06 PM · Patch-For-Review, Wikimedia-Site-requests, Technical-Debt
Marostegui moved T114902: Remove numeric entity IDs from database schema from Triage to Blocked external/Not db team on the DBA board.
Fri, Nov 17, 2:04 PM · Goal, DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui added a comment to T131956: Disabling general.confirmeduser from dbreports for using up too much db resources.

Any objection to close this ticket?
It is pretty old, the problematic job was disabled more than a year ago and the old labsdb servers are going away.

Fri, Nov 17, 1:07 PM · Toolforge, Cloud-Services, DBA
Marostegui moved T57385: Investigate dropping "edit_page_tracking" database table from Wikimedia wikis after archiving it from Triage to Backlog on the DBA board.

@MZMcBride I assume this table is to be dropped, right? So I can update its entry on T57385 "Removable" row saying "YES" ?

Fri, Nov 17, 1:03 PM · Operations, DBA
Marostegui updated the task description for T57385: Investigate dropping "edit_page_tracking" database table from Wikimedia wikis after archiving it.
Fri, Nov 17, 1:02 PM · Operations, DBA
Marostegui created P6339 (An Untitled Masterwork).
Fri, Nov 17, 1:00 PM
Marostegui updated the task description for T180714: s5 primary master db1063 crashed.
Fri, Nov 17, 10:41 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui added a comment to T180714: s5 primary master db1063 crashed.

I have copied db1063's binlogs over to:

root@dbstore1001:/srv/tmp/T180714# ls -lh
total 21G
-rw-r--r-- 1 root root 21G Nov 17 10:39 db1063_binlogs.tar.gz
Fri, Nov 17, 10:41 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui updated the task description for T180714: s5 primary master db1063 crashed.
Fri, Nov 17, 9:36 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui updated the task description for T180714: s5 primary master db1063 crashed.
Fri, Nov 17, 9:23 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui updated the task description for T180714: s5 primary master db1063 crashed.
Fri, Nov 17, 8:45 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui added a comment to T180714: s5 primary master db1063 crashed.

db1063 is totally broken and won't start: https://phabricator.wikimedia.org/P6337

Fri, Nov 17, 8:44 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui created P6337 (An Untitled Masterwork).
Fri, Nov 17, 8:43 AM
Marostegui updated the task description for T180714: s5 primary master db1063 crashed.
Fri, Nov 17, 8:26 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui updated the task description for T180714: s5 primary master db1063 crashed.
Fri, Nov 17, 8:00 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui lowered the priority of T180714: s5 primary master db1063 crashed from Unbreak Now! to Normal.

Setting back priority to normal as we are back to a normal state now.
Pending things:

  • move dbstore1001 under the new master
  • decide what to do with db1063 (definitely not a master for anything else, we might rebuild it as vslow from db1071 for example)
Fri, Nov 17, 7:34 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui updated the task description for T174569: Schema change for refactored comment storage.
Fri, Nov 17, 6:46 AM · Patch-For-Review, MediaWiki-Platform-Team (MWPT-Q2-Oct-Dec-2017), Dumps-Generation, Data-Services, Blocked-on-schema-change, DBA
Marostegui moved T180714: s5 primary master db1063 crashed from Triage to In progress on the DBA board.
Fri, Nov 17, 6:36 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui moved T180694: Test moving testwikidatawiki database to s8 replica set on Wikimedia from Triage to In progress on the DBA board.
Fri, Nov 17, 6:36 AM · DBA, MediaWiki-Configuration, Wikidata, Operations
Marostegui moved T180700: Rack and setup db1109 and db1110 from Triage to In progress on the DBA board.
Fri, Nov 17, 6:36 AM · Patch-For-Review, ops-eqiad, Operations, DBA
Marostegui renamed T180714: s5 primary master db1063 crashed from db1063 crashed to s5 primary master db1063 crashed.
Fri, Nov 17, 6:31 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui added a comment to T180714: s5 primary master db1063 crashed.

All the hosts (apart from db1071) are now up to date and ready to be pooled.
I have done: https://gerrit.wikimedia.org/r/#/c/391995/2 but I will wait for another pair of eyes to take a look at that patch before deploying it.

Fri, Nov 17, 5:09 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui added a comment to T180714: s5 primary master db1063 crashed.

Reminder: we have to enable GTID on the slaves.

Fri, Nov 17, 4:59 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui added a comment to T180714: s5 primary master db1063 crashed.

ipblocks done
filearchive done
oldimage done
protected_titles done

Fri, Nov 17, 4:56 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui added a comment to T180714: s5 primary master db1063 crashed.

archive done
doing ipblocks now

Fri, Nov 17, 4:51 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA
Marostegui added a comment to T180714: s5 primary master db1063 crashed.

reverts for logging and recentchanges on dewiki and wikidatawiki are finished for most of the hosts (still running for db1071 as its hardware isn't as powerful).
Also finishing reverting archive table (very small table, 5G in dewiki and 2G in wikidata)

Fri, Nov 17, 4:48 AM · Wikimedia-Incident, Patch-For-Review, Operations, DBA

Thu, Nov 16

Marostegui added a comment to T174569: Schema change for refactored comment storage.

All s5 is being reverted because of: T180714 (nothing to do with the crash) but the crash left half the schema changed deployed and that broke replication.
Bad timing.

Thu, Nov 16, 7:02 PM · Patch-For-Review, MediaWiki-Platform-Team (MWPT-Q2-Oct-Dec-2017), Dumps-Generation, Data-Services, Blocked-on-schema-change, DBA
Marostegui created T180700: Rack and setup db1109 and db1110.
Thu, Nov 16, 3:47 PM · Patch-For-Review, ops-eqiad, Operations, DBA
Marostegui updated the task description for T178359: Support multi-instance on core hosts.
Thu, Nov 16, 3:34 PM · Patch-For-Review, DBA
Marostegui closed T180560: labsdb1004 replication broken: in memory tables from s51290__dpl_p as Resolved.
Thu, Nov 16, 3:07 PM · Patch-For-Review, Data-Services, DBA
Marostegui added a comment to T180560: labsdb1004 replication broken: in memory tables from s51290__dpl_p.

They have been removed from config and from live replication.

Thu, Nov 16, 10:56 AM · Patch-For-Review, Data-Services, DBA
Marostegui added a comment to T178359: Support multi-instance on core hosts.

db1101.s7 is now replicating

Thu, Nov 16, 9:55 AM · Patch-For-Review, DBA
Marostegui added a comment to T149418: Deploy gtid_domain_id flag in our mysql hosts.

New update from the bug:

Andrei Elkin closed MDEV-12012.
-------------------------------
    Fix Version/s: 10.1.30
                   10.2.11
                       (was: 10.1)
       Resolution: Fixed
Thu, Nov 16, 8:58 AM · Patch-For-Review, DBA
Marostegui removed a project from T179508: Admin Score: score for account-age wrong?: DBA.

The sanitization for the user table is as follows:

*************************** 11. row ***************************
             Trigger: user_insert
               Event: INSERT
               Table: user
           Statement: SET NEW.user_password = '', NEW.user_newpassword = '', NEW.user_email = '', NEW.user_options = '', NEW.user_token = '', NEW.user_email_authenticated = '', NEW.user_email_token = '', NEW.user_email_token_expires = '', NEW.user_newpass_time = ''
              Timing: BEFORE
             Created: NULL
            sql_mode: IGNORE_BAD_TABLE_OPTIONS
             Definer: root@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: binary
*************************** 12. row ***************************
             Trigger: user_update
               Event: UPDATE
               Table: user
           Statement: SET NEW.user_password = '', NEW.user_newpassword = '', NEW.user_email = '', NEW.user_options = '', NEW.user_token = '', NEW.user_email_authenticated = '', NEW.user_email_token = '', NEW.user_email_token_expires = '', NEW.user_newpass_time = ''
              Timing: BEFORE
             Created: NULL
            sql_mode: IGNORE_BAD_TABLE_OPTIONS
             Definer: root@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: binary
12 rows in set (0.00 sec)
Thu, Nov 16, 8:42 AM · User-Matthewrbowker, XTools
Marostegui updated the task description for T179106: Drop the "wb_terms.wb_terms_language" index.
Thu, Nov 16, 8:21 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui created P6326 (An Untitled Masterwork).
Thu, Nov 16, 7:41 AM
Marostegui updated the task description for T179106: Drop the "wb_terms.wb_terms_language" index.
Thu, Nov 16, 6:55 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui closed T180560: labsdb1004 replication broken: in memory tables from s51290__dpl_p as Resolved.

I am going to fix this as resolved, as the scope of the ticket is finished.
As nothing else broke in 24h, I am going to leave the replication filters as they are, that is: not replicating in memory tables.
If something else happens, and as Jaime said, we will need to ban the whole user from this host, meaning no backups/redundancy for it.
Once this is fixed from code side, feel free to reopen this ticket or create a new one so we can remove the replication filters.

Thu, Nov 16, 6:05 AM · Patch-For-Review, Data-Services, DBA

Wed, Nov 15

Marostegui added a comment to T180636: Make Dispenser's principle_links table accessible in new Wiki replica cluster.

I guess that should go to the tools servers

Wed, Nov 15, 9:06 PM · Goal, cloud-services-team (FY2017-18), Data-Services, DBA
Marostegui closed T98110: Pagelinks table contains a row having pl_from = 0 as Resolved.

This is not present anywhere on dewiki core hosts

root@neodymium:/home/marostegui/git/software/dbtools# cat s5.hosts | grep -v labs | while read host port ; do echo $host:$port; mysql --skip-ssl dewiki -h$host -P$port -e "select * from pagelinks where pl_from = 0;" ; done
dbstore2001.codfw.wmnet:3315
db2089.codfw.wmnet:3315
db2086.codfw.wmnet:3315
db2085.codfw.wmnet:3315
db2084.codfw.wmnet:3315
db2083.codfw.wmnet:3306
db2082.codfw.wmnet:3306
db2081.codfw.wmnet:3306
db2080.codfw.wmnet:3306
db2079.codfw.wmnet:3306
db2075.codfw.wmnet:3306
db2038.codfw.wmnet:3306
db2045.codfw.wmnet:3306
db2052.codfw.wmnet:3306
db2059.codfw.wmnet:3306
db2066.codfw.wmnet:3306
db2023.codfw.wmnet:3306
db1095.eqiad.wmnet:3306
dbstore1001.eqiad.wmnet:3306
dbstore1002.eqiad.wmnet:3306
db1070.eqiad.wmnet:3306
db1071.eqiad.wmnet:3306
db1082.eqiad.wmnet:3306
db1087.eqiad.wmnet:3306
db1092.eqiad.wmnet:3306
db1096.eqiad.wmnet:3306
db1099.eqiad.wmnet:3306
db1100.eqiad.wmnet:3306
db1104.eqiad.wmnet:3306
db1106.eqiad.wmnet:3306
db1063.eqiad.wmnet:3306
Wed, Nov 15, 2:23 PM · DBA, Cloud-Services
Marostegui removed a project from T149643: Review Icinga alarms with disabled notifications: DBA.

I have reviewed and added the ones for the DBs that could already be enabled back. As soon as puppet starts running they should be picked up.
Thanks for the report!

Wed, Nov 15, 1:55 PM · Operations
Marostegui added a comment to T179628: Consider granting `CREATE TEMPORARY TABLES` to labsdbuser.

@Dispenser but on that missing_entries.sql there are tons of things that wouldn't work with the current and new replicas if I am not missing something.
You are basically using your user (I assume yours) database, which will no longer be there as these hosts are RO.
ie:

DROP TABLE IF EXISTS s52690__p.all_articles, s52690__p.dabs, s52690__p.dabs2page, s52690__p.si, s52690__p.si2page;
CREATE TABLE s52690__p.all_articles
Wed, Nov 15, 1:43 PM · DBA, Data-Services
Marostegui closed T121306: Error: 2013 Lost connection to MySQL server during query on IndexPager::buildQueryInfo (LogPager) as Resolved.

I managed to perform that suppression at the second try, so it does look like a random problem.

Please do feel free to split this issue to a separate report if it helps with troubleshooting it.

Wed, Nov 15, 12:05 PM · MediaWiki-Change-tagging, Wikimedia-log-errors, DBA, MediaWiki-Database, MediaWiki-Logging
Marostegui updated the task description for T179106: Drop the "wb_terms.wb_terms_language" index.
Wed, Nov 15, 11:58 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui created P6316 (An Untitled Masterwork).
Wed, Nov 15, 11:08 AM
Marostegui added a comment to T180560: labsdb1004 replication broken: in memory tables from s51290__dpl_p.

Unfortunately, new memory tables keep coming, so I am going to filter all the tables called i_ and u_ because I wouldn't want to filter the whole user.

Wed, Nov 15, 8:43 AM · Patch-For-Review, Data-Services, DBA
Marostegui added a comment to T180560: labsdb1004 replication broken: in memory tables from s51290__dpl_p.

u_pagelinks also broke

Last_Error: Could not execute Write_rows_v1 event on table s51290__dpl_p.u_pagelinks; The table 'u_pagelinks' is full, Error_code: 1114; handler error HA_ERR_RECORD_FILE_FULL; the event's master log log.164218, end_log_pos 45342262
Wed, Nov 15, 8:27 AM · Patch-For-Review, Data-Services, DBA
Marostegui added a comment to T180560: labsdb1004 replication broken: in memory tables from s51290__dpl_p.

And as expected, a drop and a recreation of i_psub also happened, recreating the table with MEMORY engine, so it got full again and replication broke.
So I am going to add this table to the replication ignore list until the user fixes the logic.

Wed, Nov 15, 8:01 AM · Patch-For-Review, Data-Services, DBA
Marostegui renamed T180560: labsdb1004 replication broken: in memory tables from s51290__dpl_p from labsdb1004 replication broken: table s51290__dpl_p.i_psub; The table 'i_psub' is full to labsdb1004 replication broken: in memory tables from s51290__dpl_p.
Wed, Nov 15, 8:00 AM · Patch-For-Review, Data-Services, DBA
Marostegui updated the task description for T179106: Drop the "wb_terms.wb_terms_language" index.
Wed, Nov 15, 7:48 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui added a comment to T180560: labsdb1004 replication broken: in memory tables from s51290__dpl_p.

With this new table, the procedure of recreating it as innodb doesn't work, as the code is dropping it and recreating it all the time, so it comes back as Memory table again, breaking replication.
So the solution for now is to ignore that table while replicating.
So I have set the replication filter until this gets fixed:

replicate_wild_ignore_table    = s51290__dpl_p.i_orphan_candidates
Wed, Nov 15, 7:29 AM · Patch-For-Review, Data-Services, DBA
Marostegui added a comment to T180560: labsdb1004 replication broken: in memory tables from s51290__dpl_p.

And replication got broken with a different in memory table from the same user:

mysql:root@localhost [s51290__dpl_p]> show create table s51290__dpl_p.i_orphan_candidates;
+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table               | Create Table                                                                                                                                                                                                                                          |
+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| i_orphan_candidates | CREATE TABLE `i_orphan_candidates` (
  `oc_id` int(10) unsigned NOT NULL DEFAULT '0',
  `oc_title` varchar(255) COLLATE utf8mb4_bin NOT NULL DEFAULT '',
  UNIQUE KEY `u_oc` (`oc_title`)
) ENGINE=MEMORY DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin |
+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Wed, Nov 15, 7:09 AM · Patch-For-Review, Data-Services, DBA
Marostegui added a comment to T180560: labsdb1004 replication broken: in memory tables from s51290__dpl_p.

So the only way to solve this issue is: truncating the table or creating it with InnoDB.
Given this is a MEMORY table, what I have done is:

Wed, Nov 15, 7:08 AM · Patch-For-Review, Data-Services, DBA
Marostegui added a comment to T180560: labsdb1004 replication broken: in memory tables from s51290__dpl_p.

First attempts to convert the table to InnoDB/Aria are not successful:

Wed, Nov 15, 6:39 AM · Patch-For-Review, Data-Services, DBA
Marostegui moved T180560: labsdb1004 replication broken: in memory tables from s51290__dpl_p from Backlog to ToolsDB on the Data-Services board.
Wed, Nov 15, 6:32 AM · Patch-For-Review, Data-Services, DBA
Marostegui moved T180560: labsdb1004 replication broken: in memory tables from s51290__dpl_p from Triage to In progress on the DBA board.
Wed, Nov 15, 6:32 AM · Patch-For-Review, Data-Services, DBA
Marostegui created T180560: labsdb1004 replication broken: in memory tables from s51290__dpl_p.
Wed, Nov 15, 6:32 AM · Patch-For-Review, Data-Services, DBA

Tue, Nov 14

Marostegui added a comment to T174569: Schema change for refactored comment storage.

The following hosts are done in s5 eqiad:
dbstore1002
dbstore1001
db1104
db1100
db1106

Tue, Nov 14, 8:53 PM · Patch-For-Review, MediaWiki-Platform-Team (MWPT-Q2-Oct-Dec-2017), Dumps-Generation, Data-Services, Blocked-on-schema-change, DBA
Marostegui added a comment to T178359: Support multi-instance on core hosts.

db1105 has been fully pooled in s1 and s2.
db1101 from s2 can now (not today, not till we are sure db1105 works fine for a day - we already have db1103 s2 and has been working fine since Monday) be reimaged and converted to multi-instance and pooled where it was scheduled to serve.

Tue, Nov 14, 3:05 PM · Patch-For-Review, DBA
Marostegui added a comment to T178162: Decommission db1050.

And let's make sure we mark that bad disk as broken so it is not re-used somewhere else :-)

Tue, Nov 14, 3:05 PM · hardware-requests, ops-eqiad, Operations, Patch-For-Review, DBA
Marostegui created P6309 (An Untitled Masterwork).
Tue, Nov 14, 11:43 AM
Marostegui updated the task description for T179106: Drop the "wb_terms.wb_terms_language" index.
Tue, Nov 14, 11:31 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui updated the task description for T174569: Schema change for refactored comment storage.
Tue, Nov 14, 10:41 AM · Patch-For-Review, MediaWiki-Platform-Team (MWPT-Q2-Oct-Dec-2017), Dumps-Generation, Data-Services, Blocked-on-schema-change, DBA
Marostegui added a comment to T174569: Schema change for refactored comment storage.

s6 has been done in eqiad.

Tue, Nov 14, 10:41 AM · Patch-For-Review, MediaWiki-Platform-Team (MWPT-Q2-Oct-Dec-2017), Dumps-Generation, Data-Services, Blocked-on-schema-change, DBA
Marostegui updated the task description for T178359: Support multi-instance on core hosts.
Tue, Nov 14, 9:48 AM · Patch-For-Review, DBA
Marostegui added a comment to T179106: Drop the "wb_terms.wb_terms_language" index.

This is the non-compressed status:

root@db2052:~# ls -lh /srv/sqldata/wikidatawiki/wb_terms.ibd
-rw-rw---- 1 mysql mysql 718G Nov 13 07:32 /srv/sqldata/wikidatawiki/wb_terms.ibd
Tue, Nov 14, 6:35 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui added a comment to T178359: Support multi-instance on core hosts.

db1103 is now serving s2 and s4 recentchanges services with full normal weight.

Tue, Nov 14, 6:28 AM · Patch-For-Review, DBA

Mon, Nov 13

Marostegui added a comment to T173647: Prepare and check storage layer for hif.wiktionary.

BTW @Marostegui:

aborrero@tools-bastion-03:~$ sql --cluster web hifwiktionary_p
ERROR 1044 (42000): Access denied for user 'u18194'@'%' to database 'hifwiktionary_p'

aborrero@tools-bastion-03:~$ sql --cluster analytics hifwiktionary_p
[.. works fine ..]
MariaDB [hifwiktionary_p]> Bye

aborrero@tools-bastion-03:~$ sql --cluster labsdb hifwiktionary_p
[.. works fine ..]
MariaDB [hifwiktionary_p]> Bye

Se we still have some missing grants. Could you please take a look?

Mon, Nov 13, 6:53 PM · cloud-services-team (Kanban), Cloud-Services, DBA
Marostegui edited projects for T180014: User overloading labsdb1003 and making it lag, added: Data-Services; removed Cloud-Services.
Mon, Nov 13, 6:26 PM · Data-Services
Marostegui added a comment to T173647: Prepare and check storage layer for hif.wiktionary.

I found the issue.
The first comment from Madhu (T173647#3748572) was about labsdb1011, so I only gave the grants to that host as I assumed the other worked fine. So 1009 and 1010 (sby) didn't have them.
So all works fine now:

marostegui@tools-bastion-03:~$ mysql --defaults-file=$HOME/replica.my.cnf -h hifwiktionary.analytics.db.svc.eqiad.wmflabs -A
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 7473357
Server version: 10.1.28-MariaDB MariaDB Server
Mon, Nov 13, 5:12 PM · cloud-services-team (Kanban), Cloud-Services, DBA
Marostegui added a comment to T179244: labsdb1009 crashed - OOM.

It happened twice- we cannot trust labsdb1009- copying labsdb1010 away and failing it over is a day's work, with very little human intervention.

Mon, Nov 13, 4:24 PM · Patch-For-Review, cloud-services-team (Kanban), DBA
Marostegui added a comment to T179244: labsdb1009 crashed - OOM.

For the record, this is the first table that crashed, probably corrupted because of the crash:

Nov 13 13:45:45 labsdb1009 mysqld[31730]: 2017-11-13 13:45:45 139666025916160 [ERROR] Master 'db1095': Slave SQL: Could not execute Delete_rows_v1 event on table cebwiki.geo_tags; Can't find record in 'geo_tags', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log db1095-bin.003688, end_log_pos 129839964, Gtid 171966669-171966669-1334350810, Internal MariaDB error code: 1032
Mon, Nov 13, 4:19 PM · Patch-For-Review, cloud-services-team (Kanban), DBA
Marostegui triaged T179628: Consider granting `CREATE TEMPORARY TABLES` to labsdbuser as Normal priority.
Mon, Nov 13, 1:13 PM · DBA, Data-Services
Marostegui added a comment to T179628: Consider granting `CREATE TEMPORARY TABLES` to labsdbuser.

We have been discussing this ticket during our meeting and we don't have a clear picture of what problem you are trying to solve here.
Could you give us an example of what is broken and/or what you would like to achieve with these temporary tables?

Mon, Nov 13, 1:13 PM · DBA, Data-Services
Marostegui added a comment to T174569: Schema change for refactored comment storage.

Mentioned in SAL (#wikimedia-operations) [2017-11-13T13:09:39Z] <marostegui> Deploy schema change on db1083 - T174569

Mon, Nov 13, 1:11 PM · Patch-For-Review, MediaWiki-Platform-Team (MWPT-Q2-Oct-Dec-2017), Dumps-Generation, Data-Services, Blocked-on-schema-change, DBA
Marostegui updated the task description for T178359: Support multi-instance on core hosts.
Mon, Nov 13, 12:22 PM · Patch-For-Review, DBA
Marostegui updated the task description for T179106: Drop the "wb_terms.wb_terms_language" index.
Mon, Nov 13, 7:41 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui added a comment to T179106: Drop the "wb_terms.wb_terms_language" index.

The index has been dropped from all codfw.
I am optimizing a non-compressed slave to see if the gain is worth the time. As we have seen that probably for a compressed one is not.

root@neodymium:/home/marostegui/git/software/dbtools# cat s5.hosts | grep codfw | while read host port; do echo $host:$port; mysql --skip-ssl wikidatawiki -h$host -P$port -e "show create table wb_terms\G" | grep wb_terms_language; done
dbstore2001.codfw.wmnet:3315
db2089.codfw.wmnet:3315
db2086.codfw.wmnet:3315
db2085.codfw.wmnet:3315
db2084.codfw.wmnet:3315
db2083.codfw.wmnet:3306
db2082.codfw.wmnet:3306
db2081.codfw.wmnet:3306
db2080.codfw.wmnet:3306
db2079.codfw.wmnet:3306
db2075.codfw.wmnet:3306
db2038.codfw.wmnet:3306
db2045.codfw.wmnet:3306
db2052.codfw.wmnet:3306
db2059.codfw.wmnet:3306
db2066.codfw.wmnet:3306
db2023.codfw.wmnet:3306
Mon, Nov 13, 7:32 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui updated the task description for T179106: Drop the "wb_terms.wb_terms_language" index.
Mon, Nov 13, 7:29 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui claimed T180045: Review and deploy schema change on dropping oresc_rev_predicted_model index.
Mon, Nov 13, 7:24 AM · DBA, Blocked-on-schema-change, MediaWiki-extensions-ORES, User-Ladsgroup, Scoring-platform-team (Current)
Marostegui closed T179793: Consider dropping the "wb_items_per_site.wb_ips_site_page" index as Resolved.

This is all done:

root@neodymium:/home/marostegui/git/software/dbtools# cat s5.hosts | grep -v "labs" | while read host port; do echo $host:$port; mysql --skip-ssl wikidatawiki -h$host -P$port -e "show create table wb_items_per_site\G" | grep wb_ips_site_page; done
dbstore2001.codfw.wmnet:3315
db2089.codfw.wmnet:3315
db2086.codfw.wmnet:3315
db2085.codfw.wmnet:3315
db2084.codfw.wmnet:3315
db2083.codfw.wmnet:3306
db2082.codfw.wmnet:3306
db2081.codfw.wmnet:3306
db2080.codfw.wmnet:3306
db2079.codfw.wmnet:3306
db2075.codfw.wmnet:3306
db2038.codfw.wmnet:3306
db2045.codfw.wmnet:3306
db2052.codfw.wmnet:3306
db2059.codfw.wmnet:3306
db2066.codfw.wmnet:3306
db2023.codfw.wmnet:3306
db1095.eqiad.wmnet:3306
dbstore1001.eqiad.wmnet:3306
dbstore1002.eqiad.wmnet:3306
db1070.eqiad.wmnet:3306
db1071.eqiad.wmnet:3306
db1082.eqiad.wmnet:3306
db1087.eqiad.wmnet:3306
db1092.eqiad.wmnet:3306
db1096.eqiad.wmnet:3306
db1099.eqiad.wmnet:3306
db1100.eqiad.wmnet:3306
db1104.eqiad.wmnet:3306
db1106.eqiad.wmnet:3306
db1063.eqiad.wmnet:3306
Mon, Nov 13, 7:23 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui updated the task description for T179793: Consider dropping the "wb_items_per_site.wb_ips_site_page" index.
Mon, Nov 13, 7:23 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui updated the task description for T179793: Consider dropping the "wb_items_per_site.wb_ips_site_page" index.
Mon, Nov 13, 7:16 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui updated the task description for T179793: Consider dropping the "wb_items_per_site.wb_ips_site_page" index.
Mon, Nov 13, 7:14 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui updated the task description for T179793: Consider dropping the "wb_items_per_site.wb_ips_site_page" index.
Mon, Nov 13, 7:10 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui updated the task description for T179793: Consider dropping the "wb_items_per_site.wb_ips_site_page" index.
Mon, Nov 13, 7:06 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui updated the task description for T179793: Consider dropping the "wb_items_per_site.wb_ips_site_page" index.
Mon, Nov 13, 7:04 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata
Marostegui added a comment to T179793: Consider dropping the "wb_items_per_site.wb_ips_site_page" index.

index dropped from codfw:

root@neodymium:/home/marostegui/git/software/dbtools# cat s5.hosts | grep codfw | while read host port; do echo $host:$port; mysql --skip-ssl wikidatawiki -h$host -P$port -e "show create table wb_items_per_site\G" | grep wb_ips_site_page; done
dbstore2001.codfw.wmnet:3315
db2089.codfw.wmnet:3315
db2086.codfw.wmnet:3315
db2085.codfw.wmnet:3315
db2084.codfw.wmnet:3315
db2083.codfw.wmnet:3306
db2082.codfw.wmnet:3306
db2081.codfw.wmnet:3306
db2080.codfw.wmnet:3306
db2079.codfw.wmnet:3306
db2075.codfw.wmnet:3306
db2038.codfw.wmnet:3306
db2045.codfw.wmnet:3306
db2052.codfw.wmnet:3306
db2059.codfw.wmnet:3306
db2066.codfw.wmnet:3306
db2023.codfw.wmnet:3306
Mon, Nov 13, 6:49 AM · DBA, MediaWiki-extensions-WikibaseRepository, Wikidata