Page MenuHomePhabricator
Feed Advanced Search

Today

Marostegui added a comment to T219374: Prepare and check storage layer for hi.wikisource.

Thanks for providing the context Brooke!
I will remove the temporary index tomorrow and do a --replace for that specific table in that specific new wiki, which shouldn't block you anyways

Thu, Sep 19, 4:42 PM · Core Platform Team Workboards (Clinic Duty Team), cloud-services-team, Analytics, Data-Services, DBA
Marostegui added a comment to T219374: Prepare and check storage layer for hi.wikisource.

Yep, it was intentional

Thu, Sep 19, 4:08 PM · Core Platform Team Workboards (Clinic Duty Team), cloud-services-team, Analytics, Data-Services, DBA
Marostegui added a comment to T219374: Prepare and check storage layer for hi.wikisource.

Let's wait for @Bstorm then before closing this as resolved
Thanks!

Thu, Sep 19, 2:39 PM · Core Platform Team Workboards (Clinic Duty Team), cloud-services-team, Analytics, Data-Services, DBA
Marostegui added a comment to T219374: Prepare and check storage layer for hi.wikisource.

So everything done?
From my side I can connect fine and query the views.

Thu, Sep 19, 2:30 PM · Core Platform Team Workboards (Clinic Duty Team), cloud-services-team, Analytics, Data-Services, DBA
Marostegui added a comment to T219374: Prepare and check storage layer for hi.wikisource.

For what is worth, I can access the hosts from bastion with -- cluster analytics and --cluster web and I can select fine from the tables, so most of the work is done successfully! Just that small error pending to fix :-)

Thu, Sep 19, 2:23 PM · Core Platform Team Workboards (Clinic Duty Team), cloud-services-team, Analytics, Data-Services, DBA
Marostegui added a comment to T219374: Prepare and check storage layer for hi.wikisource.

@JHedden after your +1, I have merged the change. Can you run puppet and try again to see if that index error is no more?

Thu, Sep 19, 2:21 PM · Core Platform Team Workboards (Clinic Duty Team), cloud-services-team, Analytics, Data-Services, DBA
jcrespo awarded T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4] a Like token.
Thu, Sep 19, 2:03 PM · Patch-For-Review, DBA
Marostegui updated subscribers of T219374: Prepare and check storage layer for hi.wikisource.

OK, the replica DNS entries are setup for hiwikisource now.
I went through the full instructions for cloud services https://wikitech.wikimedia.org/wiki/Add_a_wiki#Cloud_Services
Creating the replica indexes failed with

/usr/local/sbin/maintain-replica-indexes --database hiwikisource --debug
2019-09-19 13:35:22,973 DEBUG
    SELECT table_schema FROM information_schema.tables
    WHERE table_name='archive' and table_schema
    like 'hiwikisource' and table_type='BASE TABLE'
2019-09-19 13:35:22,987 DEBUG SHOW INDEX from hiwikisource.archive
2019-09-19 13:35:22,989 INFO
    ALTER TABLE hiwikisource.archive ADD KEY user_timestamp (ar_user, ar_timestamp)
Traceback (most recent call last):
  File "/usr/local/sbin/maintain-replica-indexes", line 159, in <module>
    main()
  File "/usr/local/sbin/maintain-replica-indexes", line 150, in main
    write_index(change_cursor, db_name, index, args.dryrun, args.debug)
  File "/usr/local/sbin/maintain-replica-indexes", line 93, in write_index
    cursor.execute(query)
  File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 166, in execute
    result = self._query(query)
  File "/usr/lib/python3/dist-packages/pymysql/cursors.py", line 322, in _query
    conn.query(q)
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 852, in query
    self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1053, in _read_query_result
    result.read()
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1336, in read
    first_packet = self.connection._read_packet()
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 1010, in _read_packet
    packet.check_error()
  File "/usr/lib/python3/dist-packages/pymysql/connections.py", line 393, in check_error
    err.raise_mysql_exception(self._data)
  File "/usr/lib/python3/dist-packages/pymysql/err.py", line 107, in raise_mysql_exception
    raise errorclass(errno, errval)
pymysql.err.InternalError: (1072, "Key column 'ar_user' doesn't exist in table")
Thu, Sep 19, 2:00 PM · Core Platform Team Workboards (Clinic Duty Team), cloud-services-team, Analytics, Data-Services, DBA
Marostegui added a comment to T223151: Review special replica partitioning of certain tables by `xx_user`.

db1089 has been serving pretty almost 8h now the logpager service handling 50% of its traffic:

"logpager": {
   "db1089": 2,
   "db1099:3311": 1,
   "db1105:3311": 1
Thu, Sep 19, 1:46 PM · Performance, Core Platform Team Legacy (Watching / External), DBA
Marostegui added a comment to T219374: Prepare and check storage layer for hi.wikisource.

DNS and all the other operations done too?
I still cannot access the database from the tools bastion:

marostegui@tools-sgebastion-07:~$ sql hiwikisource_p
Could not find requested database
Thu, Sep 19, 1:29 PM · Core Platform Team Workboards (Clinic Duty Team), cloud-services-team, Analytics, Data-Services, DBA
Marostegui added a comment to T219374: Prepare and check storage layer for hi.wikisource.

Excellent, you can proceed on labsdb1010 and labsdb1011 too. I have created the DB there too.

Thu, Sep 19, 1:23 PM · Core Platform Team Workboards (Clinic Duty Team), cloud-services-team, Analytics, Data-Services, DBA
Marostegui added a comment to T219374: Prepare and check storage layer for hi.wikisource.

Can you try again on labsdb1009?

Thu, Sep 19, 1:18 PM · Core Platform Team Workboards (Clinic Duty Team), cloud-services-team, Analytics, Data-Services, DBA
Marostegui added a comment to T219374: Prepare and check storage layer for hi.wikisource.

Probably, let me check

Thu, Sep 19, 1:17 PM · Core Platform Team Workboards (Clinic Duty Team), cloud-services-team, Analytics, Data-Services, DBA
Marostegui moved T230783: Switchover s3 primary database master db1075 -> db1078 - 24th Sept @05:00 UTC from Next to In progress on the DBA board.
Thu, Sep 19, 12:18 PM · Patch-For-Review, DBA, Operations
Marostegui placed T219374: Prepare and check storage layer for hi.wikisource up for grabs.

This is ready for cloud-services-team - the views can be created on labsdb1009-labsdb1012.
The check data scripts have finished and everything is fine.
I have also changed the grants for labsdbuser role.

root@cumin1001:/home/marostegui/git/tendril/bin# for i in `seq 1009 1012`; do echo labsdb$i; mysql.py -hlabsdb$i -e " show grants for labsdbuser" | grep hiwikisource ;done
labsdb1009
GRANT SELECT, SHOW VIEW ON `hiwikisource\\_p`.* TO 'labsdbuser'
labsdb1010
GRANT SELECT, SHOW VIEW ON `hiwikisource\\_p`.* TO 'labsdbuser'
labsdb1011
GRANT SELECT, SHOW VIEW ON `hiwikisource\\_p`.* TO 'labsdbuser'
labsdb1012
GRANT SELECT, SHOW VIEW ON `hiwikisource\\_p`.* TO 'labsdbuser'
Thu, Sep 19, 12:15 PM · Core Platform Team Workboards (Clinic Duty Team), cloud-services-team, Analytics, Data-Services, DBA
Marostegui triaged T233273: labsdb1009 broken PSU as Normal priority.
Thu, Sep 19, 12:09 PM · Operations, DC-Ops, ops-eqiad, DBA
Marostegui added a comment to T233273: labsdb1009 broken PSU.

This server is out of warranty and @RobH has created a procurement task.

Thu, Sep 19, 12:09 PM · Operations, DC-Ops, ops-eqiad, DBA
Marostegui claimed T233281: Check/remove unused databases following labpuppetmaster deprecation.

Done:

root@db1133.eqiad.wmnet[labspuppet]> show tables;
+-------------------------+
| Tables_in_labspuppet    |
+-------------------------+
| TO_DROP_hieraassignment |
| TO_DROP_prefix          |
| TO_DROP_roleassignment  |
+-------------------------+
3 rows in set (0.00 sec)
Thu, Sep 19, 8:12 AM · DBA, Operations
Marostegui added a comment to T233281: Check/remove unused databases following labpuppetmaster deprecation.

What we normally do is, rename all the tables on the given database and leave it like that for a few days to check that nothing is really using it, and then drop it.

Thu, Sep 19, 8:07 AM · DBA, Operations
Marostegui added a comment to T231858: Archive data on eventlogging MySQL to analytics replica before decomisioning .

db1108 has also the staging database, so that might be the reason for the different sizes.

Thu, Sep 19, 7:50 AM · Analytics-Kanban, Analytics, Analytics-EventLogging
Marostegui moved T233281: Check/remove unused databases following labpuppetmaster deprecation from Triage to Backlog on the DBA board.

This is what m5 has at the moment:

+------------------------+
| Database               |
+------------------------+
| designate              |
| designate_pool_manager |
| glance                 |
| heartbeat              |
| information_schema     |
| keystone               |
| labsdbaccounts         |
| labspuppet             |
| labswiki               |
| labtestwiki            |
| mysql                  |
| neutron                |
| nova                   |
| nova_api               |
| nova_api_eqiad1        |
| nova_eqiad1            |
| ops                    |
| performance_schema     |
| striker                |
| sys                    |
| test_labsdbaccounts    |
| testreduce_0715        |
| testreduce_vd          |
+------------------------+
Thu, Sep 19, 7:39 AM · DBA, Operations
Marostegui added a comment to T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].

dbproxy1014 has been tested and it is now m1-master. In a couple of days I will revert this change as dbproxy1014 is in a rack that requires maintenance to the PDU, so I will move back m1-master to dbproxy1001 until that has passed.
Given that dbproxy1014 works fine, dbproxy1006 can go away.

Thu, Sep 19, 6:48 AM · Patch-For-Review, DBA
Marostegui added a comment to T219374: Prepare and check storage layer for hi.wikisource.

I have sanitized hiwikisource on sanitarium hosts (db1124:3313 and db2094:3313) and I have checked that the users were sanitized correctly, and my user, which I created right after placing the triggers was created with the redacted fields already.
Now I am running a check private data on the sanitarium hosts as well as on labsdb1009 and labsdb1012 to be double sure.

Thu, Sep 19, 6:27 AM · Core Platform Team Workboards (Clinic Duty Team), cloud-services-team, Analytics, Data-Services, DBA
Marostegui added a comment to T223151: Review special replica partitioning of certain tables by `xx_user`.

I have captured 30 minutes of read queries on db1105 (rc on enwiki) and replied them into db1089 (non partitioned enwiki slave) and nothing has sticked out. Specially nothing related to the logging table.
So I am going to pool db1089 on the logging service for a few hours and see what we get leaving it there for a few hours.

Thu, Sep 19, 6:00 AM · Performance, Core Platform Team Legacy (Watching / External), DBA
Marostegui updated the task description for T228258: Decommission db2043-db2069.
Thu, Sep 19, 5:13 AM · Operations, DBA
Marostegui moved T233186: Decommission db2055.codfw.wmnet from Backlog to Ready for Decommission on the decommission board.

This host is ready for DC-Ops to decommission

Thu, Sep 19, 5:13 AM · decommission, DC-Ops, ops-codfw, Operations
Marostegui reassigned T233186: Decommission db2055.codfw.wmnet from Marostegui to RobH.
Thu, Sep 19, 5:12 AM · decommission, DC-Ops, ops-codfw, Operations
Marostegui updated the task description for T233186: Decommission db2055.codfw.wmnet.
Thu, Sep 19, 5:10 AM · decommission, DC-Ops, ops-codfw, Operations
Marostegui added a project to T233240: Remove MySQL aliasing for user_newtalk indexes: User-Marostegui.
Thu, Sep 19, 5:02 AM · User-Marostegui, Patch-For-Review, Schema-change, MediaWiki-General, Core Platform Team Workboards (Clinic Duty Team)
Marostegui added a comment to T233240: Remove MySQL aliasing for user_newtalk indexes.

Thanks - I might include this alter along with the other ones we are doing for T233135

Thu, Sep 19, 4:52 AM · User-Marostegui, Patch-For-Review, Schema-change, MediaWiki-General, Core Platform Team Workboards (Clinic Duty Team)
Marostegui moved T233273: labsdb1009 broken PSU from Triage to In progress on the DBA board.
Thu, Sep 19, 4:50 AM · Operations, DC-Ops, ops-eqiad, DBA
Marostegui added a subtask for T233248: Power issue in eqiad A1: T233273: labsdb1009 broken PSU.
Thu, Sep 19, 4:50 AM · ops-eqiad, Operations
Marostegui added a parent task for T233273: labsdb1009 broken PSU: T233248: Power issue in eqiad A1.
Thu, Sep 19, 4:50 AM · Operations, DC-Ops, ops-eqiad, DBA
Marostegui created T233273: labsdb1009 broken PSU.
Thu, Sep 19, 4:50 AM · Operations, DC-Ops, ops-eqiad, DBA

Yesterday

Marostegui triaged T223151: Review special replica partitioning of certain tables by `xx_user` as Normal priority.
Wed, Sep 18, 3:19 PM · Performance, Core Platform Team Legacy (Watching / External), DBA
Marostegui added a comment to T223151: Review special replica partitioning of certain tables by `xx_user`.

I will try to place a non-partitioned host tomorrow on the logging enwiki section for some time and enable the slow log to see what I get for the logging table so we can see how it would affect and if we need to partition by something else or change the PK.

Wed, Sep 18, 3:19 PM · Performance, Core Platform Team Legacy (Watching / External), DBA
Marostegui moved T223151: Review special replica partitioning of certain tables by `xx_user` from Blocked external/Not db team to In progress on the DBA board.
Wed, Sep 18, 3:17 PM · Performance, Core Platform Team Legacy (Watching / External), DBA
Marostegui added a comment to T223151: Review special replica partitioning of certain tables by `xx_user`.

So, I guess we'd need to decide what to do with the logging table on the special replicas as we are already at T233135: Schema change for refactored actor and comment storage.
While 085e3563ed0c might still apply, it is also true that we have more powerful hardware (more buffer pool) and specially faster storage SSD across the fleet.
Removing the partitions isn't the only issue, the problem is that we have to also modify the PK of logging table as currently it is:

PRIMARY KEY (`log_id`,`log_user`),
Wed, Sep 18, 3:09 PM · Performance, Core Platform Team Legacy (Watching / External), DBA
Marostegui added a comment to T233135: Schema change for refactored actor and comment storage.

Also, see T223151: Review special replica partitioning of certain tables by `xx_user` where we discussed this before since we knew this was coming.

Wed, Sep 18, 3:03 PM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui added a comment to T233135: Schema change for refactored actor and comment storage.

No, MediaWiki knows nothing about the paritioning. That's purely a Wikimedia thing.

Wed, Sep 18, 3:01 PM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui added a comment to T233135: Schema change for refactored actor and comment storage.

There is another issue: for the logging table, we are dropping the log_user column, which is the one we have the partitions based on:

/*!50100 PARTITION BY RANGE (log_user)
(PARTITION p1 VALUES LESS THAN (1) ENGINE = InnoDB,
 PARTITION p10000 VALUES LESS THAN (10000) ENGINE = InnoDB,
 PARTITION p20000 VALUES LESS THAN (20000) ENGINE = InnoDB,
 PARTITION p30000 VALUES LESS THAN (30000) ENGINE = InnoDB,
 PARTITION p40000 VALUES LESS THAN (40000) ENGINE = InnoDB,
 PARTITION p50000 VALUES LESS THAN (50000) ENGINE = InnoDB,
 PARTITION p75000 VALUES LESS THAN (75000) ENGINE = InnoDB,
 PARTITION p100000 VALUES LESS THAN (100000) ENGINE = InnoDB,
 PARTITION p125000 VALUES LESS THAN (125000) ENGINE = InnoDB,
 PARTITION p150000 VALUES LESS THAN (150000) ENGINE = InnoDB,
 PARTITION p175000 VALUES LESS THAN (175000) ENGINE = InnoDB,
 PARTITION p200000 VALUES LESS THAN (200000) ENGINE = InnoDB,
 PARTITION p250000 VALUES LESS THAN (250000) ENGINE = InnoDB,
 PARTITION p300000 VALUES LESS THAN (300000) ENGINE = InnoDB,
 PARTITION p400000 VALUES LESS THAN (400000) ENGINE = InnoDB,
 PARTITION p500000 VALUES LESS THAN (500000) ENGINE = InnoDB,
 PARTITION p600000 VALUES LESS THAN (600000) ENGINE = InnoDB,
 PARTITION p700000 VALUES LESS THAN (700000) ENGINE = InnoDB,
 PARTITION p800000 VALUES LESS THAN (800000) ENGINE = InnoDB,
 PARTITION p900000 VALUES LESS THAN (900000) ENGINE = InnoDB,
 PARTITION p1000000 VALUES LESS THAN (1000000) ENGINE = InnoDB,
 PARTITION p1500000 VALUES LESS THAN (1500000) ENGINE = InnoDB,
 PARTITION p2000000 VALUES LESS THAN (2000000) ENGINE = InnoDB,
 PARTITION pMAXVALUE VALUES LESS THAN MAXVALUE ENGINE = InnoDB) *
Wed, Sep 18, 2:53 PM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui added a comment to T233135: Schema change for refactored actor and comment storage.

Thanks - so I will include an DROP INDEX IF EXISTS usertext_timestamp and will also leave the DROP INDEX IF EXISTS ar_usertext_timestamp so we can cover for both situations in production.

Wed, Sep 18, 2:09 PM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui updated the task description for T231172: Alter gbw_reason/gb_reason/gbw_by_text on WMF production.
Wed, Sep 18, 1:30 PM · Blocked-on-schema-change, GlobalBlocking
Marostegui added a comment to T231172: Alter gbw_reason/gb_reason/gbw_by_text on WMF production.

s6 eqiad progress

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1005
  • db1139
  • db1131
  • db1125
  • db1113
  • db1098
  • db1096
  • db1093
  • db1088
  • db1085
  • db1061
Wed, Sep 18, 12:47 PM · Blocked-on-schema-change, GlobalBlocking
Marostegui moved T231172: Alter gbw_reason/gb_reason/gbw_by_text on WMF production from Backlog to In progress on the Blocked-on-schema-change board.
Wed, Sep 18, 12:43 PM · Blocked-on-schema-change, GlobalBlocking
Marostegui updated the task description for T231172: Alter gbw_reason/gb_reason/gbw_by_text on WMF production.
Wed, Sep 18, 12:42 PM · Blocked-on-schema-change, GlobalBlocking
Marostegui added a comment to T231172: Alter gbw_reason/gb_reason/gbw_by_text on WMF production.

I will start with s6 which I will do on codfw master first, and then on each slave on eqiad.
The tables have only 4 rows. If everything goes fine, I will start deploying directly on eqiad masters for the rest of the shards that have empty tables.

Wed, Sep 18, 12:26 PM · Blocked-on-schema-change, GlobalBlocking
Marostegui updated the task description for T231172: Alter gbw_reason/gb_reason/gbw_by_text on WMF production.
Wed, Sep 18, 12:23 PM · Blocked-on-schema-change, GlobalBlocking
Marostegui added a comment to T233207: Decommission dbproxy1006.eqiad.wmnet.

Stopped haproxy to make sure nothing really uses it before decommissioning

root@dbproxy1006:~# systemctl stop haproxy
root@dbproxy1006:~# echo "show stat" | socat /run/haproxy/haproxy.sock stdio
2019/09/18 12:04:10 socat[5642] E connect(5, AF=1 "/run/haproxy/haproxy.sock", 27): Connection refused
Wed, Sep 18, 12:04 PM · DBA
Marostegui moved T233207: Decommission dbproxy1006.eqiad.wmnet from Next to In progress on the DBA board.
Wed, Sep 18, 12:03 PM · DBA
Marostegui updated the task description for T233207: Decommission dbproxy1006.eqiad.wmnet.
Wed, Sep 18, 12:02 PM · DBA
Marostegui updated the task description for T231280: Remove grants for the old dbproxy hosts from the misc databases.
Wed, Sep 18, 11:43 AM · DBA
Marostegui updated the task description for T231967: Decommission dbproxy1005.eqiad.wmnet.
Wed, Sep 18, 11:42 AM · DC-Ops, ops-eqiad, decommission, Operations
Marostegui triaged T233207: Decommission dbproxy1006.eqiad.wmnet as Normal priority.
Wed, Sep 18, 11:42 AM · DBA
Marostegui created T233207: Decommission dbproxy1006.eqiad.wmnet.
Wed, Sep 18, 11:42 AM · DBA
Marostegui added a comment to T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].

dbproxy1014 has been tested and it is now m1-master. In a couple of days I will revert this change as dbproxy1014 is in a rack that requires maintenance to the PDU, so I will move back m1-master to dbproxy1001 until that has passed.
Given that dbproxy1014 works fine, dbproxy1006 can go away.

Wed, Sep 18, 10:29 AM · Patch-For-Review, DBA
Marostegui updated the task description for T231172: Alter gbw_reason/gb_reason/gbw_by_text on WMF production.
Wed, Sep 18, 9:01 AM · Blocked-on-schema-change, GlobalBlocking
Marostegui added a comment to T233135: Schema change for refactored actor and comment storage.

s6 db2089:3316
I found this error on the three wikis that live there: frwiki jawiki ruwiki

ERROR 1091 (42000) at line 37: Can't DROP 'ar_usertext_timestamp'; check that column/key exists
Wed, Sep 18, 8:52 AM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui updated the task description for T226778: Install new PDUs in rows A/B (Top level tracking task).
Wed, Sep 18, 8:32 AM · DC-Ops, Operations, ops-eqiad
Marostegui updated the task description for T226778: Install new PDUs in rows A/B (Top level tracking task).
Wed, Sep 18, 8:27 AM · DC-Ops, Operations, ops-eqiad
Marostegui updated the task description for T226778: Install new PDUs in rows A/B (Top level tracking task).
Wed, Sep 18, 8:26 AM · DC-Ops, Operations, ops-eqiad
Marostegui updated the task description for T226778: Install new PDUs in rows A/B (Top level tracking task).
Wed, Sep 18, 8:23 AM · DC-Ops, Operations, ops-eqiad
Marostegui triaged T233187: Drop frwiki.archive_save table as Normal priority.
Wed, Sep 18, 6:30 AM · DBA
Marostegui created T233187: Drop frwiki.archive_save table.
Wed, Sep 18, 6:30 AM · DBA
Marostegui added a comment to T233135: Schema change for refactored actor and comment storage.

Mentioned in SAL (#wikimedia-operations) [2019-09-18T05:58:05Z] <marostegui> Deploy schema change on db2097:3316 - T233135

Wed, Sep 18, 6:01 AM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui added a comment to T233135: Schema change for refactored actor and comment storage.

I want to alter s6 first host by host, if everything goes fine, I will later do codfw all at once with replication. I am also interested in seeing how much disk space we free up by removing all these columns and indexes.

Wed, Sep 18, 5:53 AM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui updated the task description for T228258: Decommission db2043-db2069.
Wed, Sep 18, 5:49 AM · Operations, DBA
Marostegui updated the task description for T233186: Decommission db2055.codfw.wmnet.
Wed, Sep 18, 5:46 AM · decommission, DC-Ops, ops-codfw, Operations
Marostegui updated the task description for T233186: Decommission db2055.codfw.wmnet.
Wed, Sep 18, 5:29 AM · decommission, DC-Ops, ops-codfw, Operations
Marostegui updated the task description for T208323: Predictive failures on disk S.M.A.R.T. status.
Wed, Sep 18, 5:28 AM · Operations, DBA
Marostegui moved T233186: Decommission db2055.codfw.wmnet from Triage to In progress on the DBA board.
Wed, Sep 18, 5:23 AM · decommission, DC-Ops, ops-codfw, Operations
Marostegui triaged T233186: Decommission db2055.codfw.wmnet as Normal priority.
Wed, Sep 18, 5:23 AM · decommission, DC-Ops, ops-codfw, Operations
Marostegui created T233186: Decommission db2055.codfw.wmnet.
Wed, Sep 18, 5:22 AM · decommission, DC-Ops, ops-codfw, Operations
Marostegui updated the task description for T228258: Decommission db2043-db2069.
Wed, Sep 18, 5:20 AM · Operations, DBA
Marostegui changed the status of T233185: Decommission db2067.codfw.wmnet, a subtask of T228258: Decommission db2043-db2069, from Open to Stalled.
Wed, Sep 18, 5:20 AM · Operations, DBA
Marostegui changed the status of T233185: Decommission db2067.codfw.wmnet from Open to Stalled.

This host is now working as m2 master in codfw, and will be replaced once the new misc hosts are bought.

Wed, Sep 18, 5:20 AM · DBA
Marostegui created T233185: Decommission db2067.codfw.wmnet.
Wed, Sep 18, 5:19 AM · DBA
Marostegui updated the task description for T233135: Schema change for refactored actor and comment storage.
Wed, Sep 18, 5:08 AM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui updated the task description for T233135: Schema change for refactored actor and comment storage.
Wed, Sep 18, 5:06 AM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui moved T233135: Schema change for refactored actor and comment storage from Blocked external/Not db team to In progress on the DBA board.
Wed, Sep 18, 5:05 AM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui triaged T233184: db2127 memory issues as Normal priority.

We are leaving this task opened for a few days to see if the errors get back.

Wed, Sep 18, 5:03 AM · ops-codfw, DBA, Operations
Marostegui created T233184: db2127 memory issues.
Wed, Sep 18, 5:02 AM · ops-codfw, DBA, Operations
Marostegui moved T233135: Schema change for refactored actor and comment storage from Backlog to In progress on the Blocked-on-schema-change board.
Wed, Sep 18, 4:49 AM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui moved T233135: Schema change for refactored actor and comment storage from In progress to Blocked external/Not db team on the DBA board.
Wed, Sep 18, 4:49 AM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui claimed T233135: Schema change for refactored actor and comment storage.

Yay!!

Wed, Sep 18, 4:49 AM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui closed T232491: Numerous people reporting issues saving edits and viewing previews/diffs as Resolved.

Going to close this for now. Feel free to reopen if needed.

Wed, Sep 18, 4:45 AM · netops, Traffic, Wikimedia-General-or-Unknown, Operations

Tue, Sep 17

Marostegui added a comment to P9119 (An Untitled Masterwork).

Sweeeeeet

Tue, Sep 17, 3:57 PM
Marostegui added a comment to P9119 (An Untitled Masterwork).
In P9119#54758, @Papaul wrote:

@Marostegui iti looks like a reboot clean the DIMM error for now. We can leave the task open for now for week and see.
Thanks.

Tue, Sep 17, 3:51 PM
Marostegui created P9119 (An Untitled Masterwork).
Tue, Sep 17, 3:13 PM
Marostegui added a comment to T227539: b3-eqiad pdu refresh (Tuesday 9/17 @11am UTC).

All the DBs have been downtimed, depooled and replication has been stopped. From the DBAs point of view, this maintenance is good to go.

Tue, Sep 17, 9:54 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T231638: db1074 crashed: Broken BBU.

The BBU showed up again (usual behaviour with a broken BBU)

root@db1074:~# hpssacli controller all show status
Tue, Sep 17, 8:03 AM · ops-eqiad, Operations, DBA
Marostegui added a comment to T231638: db1074 crashed: Broken BBU.

This host original weight was 200 in main traffic and 1 in API. I have only pooled it with weight 50 on main traffic, just to get it to do something.

Tue, Sep 17, 7:59 AM · ops-eqiad, Operations, DBA
Marostegui updated the task description for T217396: Decommission db1061-db1073.
Tue, Sep 17, 7:48 AM · Operations, DBA
Marostegui moved T232564: Decommission db1063.eqiad.wmnet from Backlog to Ready for Decommission on the decommission board.

Host ready for DC-Ops to decommission

Tue, Sep 17, 7:48 AM · DC-Ops, decommission, ops-eqiad, Operations
Marostegui reassigned T232564: Decommission db1063.eqiad.wmnet from Marostegui to RobH.
Tue, Sep 17, 7:48 AM · DC-Ops, decommission, ops-eqiad, Operations
Marostegui updated the task description for T232564: Decommission db1063.eqiad.wmnet.
Tue, Sep 17, 7:46 AM · DC-Ops, decommission, ops-eqiad, Operations
Marostegui updated the task description for T232564: Decommission db1063.eqiad.wmnet.
Tue, Sep 17, 7:39 AM · DC-Ops, decommission, ops-eqiad, Operations
Marostegui updated the task description for T217396: Decommission db1061-db1073.
Tue, Sep 17, 6:36 AM · Operations, DBA
Marostegui updated the task description for T186188: Failover DB masters in row D.
Tue, Sep 17, 6:01 AM · DBA
Marostegui added a comment to T227142: a6-eqiad pdu refresh (Tuesday 10/22 @11am UTC).

@elukey for labsdb1012 your Team would need to let us know if MySQL can be stopped for this maintenance (just in case there is powerloss, better to have MySQL stopped, as labs hosts do not have GTID enabled and the risk of corruption can be higher).

We can definitely stop mysql on it, we need labsdb up and running for jobs at the beginning of the month :)

Tue, Sep 17, 5:57 AM · DC-Ops, Operations, ops-eqiad