Page MenuHomePhabricator

Marostegui (Manuel Aróstegui)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Sep 1 2016, 6:48 AM (163 w, 16 h)
Availability
Available
IRC Nick
marostegui
LDAP User
Marostegui
MediaWiki User
MArostegui (WMF) [ Global Accounts ]

TZ: UTC +1/+2

Recent Activity

Today

Marostegui added a comment to T235695: Degraded RAID on db2067.

Thanks!

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, Rebuilding)
Thu, Oct 17, 2:57 PM · DBA, Operations, ops-codfw
Marostegui updated the task description for T235599: Recompress special slaves across eqiad and codfw.
Thu, Oct 17, 12:48 PM · DBA
Marostegui updated the task description for T232446: Compress new Wikibase tables.
Thu, Oct 17, 12:48 PM · DBA
Marostegui moved T235743: Prepare and check storage layer for mnwwiki from Triage to Blocked external/Not db team on the DBA board.

Let us know when the database is created so we can sanitize its tables and hand over to WMCS for views creation.

Thu, Oct 17, 10:47 AM · cloud-services-team (Kanban), Data-Services, DBA, Operations
Marostegui added a comment to T227133: a8-eqiad pdu refresh (Thursday 10/17 @11am UTC).

db1129 and db1117 are good to go.

Thu, Oct 17, 9:39 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T227355: DBA review for the MachineVision extension.

I am going on holidays in a few days, and I will be gone till the 11th of Novemeber, so I am not sure I will be able to tackle this before I leave. I would actually prefer if @jcrespo could give it a final look as he actually started with this months ago. If that is not possible I will see what I can do before I leave.

Thu, Oct 17, 7:55 AM · DBA, Product-Infrastructure-Team-Backlog, Machine vision
Marostegui updated the task description for T231018: specify group (api/vslow/etc) weights in terms of 0..100 instead of 0..1.
Thu, Oct 17, 6:03 AM · conftool, DBA
Marostegui added a comment to T230862: Create a way to filter only WB-related changes from Commons recentchanges.

Using tags has the advantage of more directly identifying the relevant revisions, if the planner decides that gathering all revisions with the tag then filtering by which are in RC (and probably filesorting) is a better plan than taking rows from RC (in order) and filtering by which have the tag. The equivalent querying the slots table is much less likely to be workable since it would have to fetch all revisions with the slot rather than all revisions actually changing the slot. On the other hand, I'm skeptical that long-term there would be few enough revisions so tagged that the tag-first plan would actually be better.
If the planner is going with an RC-first plan, then it seems unlikely to matter much whether it filters by joining change_tag or joining slots. There's no always-good option here, and the maybe-bad parts are the same for both.
Personally I'd lean against adding lots of rows to change_tag, which will be visible in various UIs, if the only use case is filtering on slots edited. Using a wider index (bigint+bigint+smallint rather than big​int+int) for a filtering-join seems less troublesome than adding more to an already-tall table. But you might want to ask the DBAs for their opinion.

Thu, Oct 17, 5:46 AM · Patch-For-Review, Structured Data Engineering, Structured-Data-Backlog, MediaWiki-API, Wikidata-Query-Service, SDC General, Commons, Wikidata
Marostegui moved T230883: Decommission db2052.codfw.wmnet from Ready for Decommission to pending onsite steps (codfw) on the decommission board.

Host ready for switch disablement + onsite steps

Thu, Oct 17, 5:40 AM · DC-Ops, ops-codfw, decommission, Operations
Marostegui reassigned T230883: Decommission db2052.codfw.wmnet from RobH to Papaul.
Thu, Oct 17, 5:40 AM · DC-Ops, ops-codfw, decommission, Operations
Marostegui updated the task description for T234066: Schema change to rename user_newtalk indexes.
Thu, Oct 17, 5:30 AM · Blocked-on-schema-change, DBA
Marostegui updated the task description for T233135: Schema change for refactored actor and comment storage.
Thu, Oct 17, 5:30 AM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui updated the task description for T233135: Schema change for refactored actor and comment storage.
Thu, Oct 17, 5:25 AM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui triaged T235695: Degraded RAID on db2067 as Normal priority.
Thu, Oct 17, 5:09 AM · DBA, Operations, ops-codfw
Marostegui assigned T235695: Degraded RAID on db2067 to Papaul.

@Papaul can we replace the failed disk with a brand new one?
This host is scheduled for decommission, but we have to buy its replacement first (T234608)

Thu, Oct 17, 5:09 AM · DBA, Operations, ops-codfw
Marostegui added a comment to T235697: labsdb1011 missing tools user accounts.

I'm interested if @Marostegui has any thoughts because I think this is caused by cloning labsdb1012, which doesn't currently get the user databases. That system wasn't included in the rotation on purpose, but if we are going to use it in the future for cloning like this, we may want to include it.
A repair function seems sensible if there isn't a clean way to copy the user database. I do not know if that would/could copy over max_user_connections and similar things as well, but we could try to capture it.

Thu, Oct 17, 5:07 AM · Data-Services, cloud-services-team (Kanban)
Marostegui updated the task description for T233135: Schema change for refactored actor and comment storage.
Thu, Oct 17, 5:04 AM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui updated the task description for T234066: Schema change to rename user_newtalk indexes.
Thu, Oct 17, 5:02 AM · Blocked-on-schema-change, DBA

Yesterday

Marostegui added a comment to T233236: Move labtestwikitech database to clouddb2001-dev.

Ok, and I understand that if we ever need that one on eqiad it will be equally owned by WMCS then?

Wed, Oct 16, 5:19 PM · Patch-For-Review, cloud-services-team (Kanban)
Marostegui added a comment to T233236: Move labtestwikitech database to clouddb2001-dev.

The answer is "no" if handled by us, because that is a total snowflake in our infra as at the moment we only have writable databases on eqiad that are replicated to codfw, doesn't matter whether they are MW databases or misc databases.
Having a writable database in codfw breaks our consistency because we'd have to manage 2 databases, with the same name, on different servers, being written with different data depending on the DC, and again (this hasn't been answered yet) I assume you want to have a copy of those databases on the opposite DC.

We have no need at all for a copy of the labtestwiki database in eqiad, only codfw. As a testing only wiki with no active community we can get by with periodic dumps of the database for recovery.

Wed, Oct 16, 5:13 PM · Patch-For-Review, cloud-services-team (Kanban)
Marostegui added a comment to T153815: Allow global groups to be assigned temporarily (expire).

Yes, once that patchset is reviewd and merged, you can follow: https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change

Wed, Oct 16, 3:11 PM · Schema-change, Patch-For-Review, Stewards-and-global-tools (Temporary-UserRights), MediaWiki-extensions-CentralAuth
Marostegui added a comment to T223151: Review special replica partitioning of certain tables by `xx_user`.

I have done a couple of tests, one on enwiki and another one on dewiki.

Wed, Oct 16, 2:56 PM · mariadb-optimizer-bug, Core Platform Team, MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), Performance Issue, DBA
Marostegui added a comment to T233236: Move labtestwikitech database to clouddb2001-dev.

By having a writable database in codfw, you are effectively having a split brain with the labtestwiki that is being written in eqiad, and you'd no longer be able to have cross dc replication (unless you plan to also have a new host replicating that database into an RO eqiad database for redundancy).

The only place that labtestwiki runs is in codfw. There is no MediaWiki deployment in eqiad that is connected to the labtestwiki database. The closest analog of labtestwiki for the main Wikimedia cluster is the wikifarm inside of the deployment-prep Cloud VPS project. Similarly labwiki only exists in eqiad with no counterpart in codfw. When a DC switch is performed, wikitech remains running from the labweb* hosts in eqiad.

Wed, Oct 16, 2:40 PM · Patch-For-Review, cloud-services-team (Kanban)
Marostegui added a comment to T223151: Review special replica partitioning of certain tables by `xx_user`.

So we've got rid of all the partitions on logging at T233625.
Next is revision.
I think I am going to capture some traffic that arrives to special slaves to the revision table and replay it on a non partitioned replica, to see what we get in terms of slow queries from that first approach

Wed, Oct 16, 12:24 PM · mariadb-optimizer-bug, Core Platform Team, MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), Performance Issue, DBA
Marostegui closed T233625: Change PK and remove partitions from the logging table, a subtask of T223151: Review special replica partitioning of certain tables by `xx_user`, as Resolved.
Wed, Oct 16, 12:19 PM · mariadb-optimizer-bug, Core Platform Team, MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), Performance Issue, DBA
Marostegui closed T233625: Change PK and remove partitions from the logging table, a subtask of T233135: Schema change for refactored actor and comment storage, as Resolved.
Wed, Oct 16, 12:19 PM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui closed T233625: Change PK and remove partitions from the logging table as Resolved.

All partitions removed from the logging table of all the special slaves.

Wed, Oct 16, 12:19 PM · DBA
Marostegui updated the task description for T233625: Change PK and remove partitions from the logging table.
Wed, Oct 16, 12:18 PM · DBA
Marostegui updated the task description for T233625: Change PK and remove partitions from the logging table.
Wed, Oct 16, 12:18 PM · DBA
Marostegui updated the task description for T234704: Remove ar_comment from sanitarium triggers.
Wed, Oct 16, 10:21 AM · DBA
Marostegui added a comment to T234948: New Wikibase deadlocks on Wikidata wiki since 2019-10-08T00:00:02: Wikibase\Lib\Store\Sql\Terms\{closure} Deadlock found when trying to get lock; try restarting transaction.

That's indeed not ideal (the fact that the transaction isn't retried), and probably merits some investigation and some potential mitigation methods as deadlocks can happen anytime (ideally not often, but are a possibility with high concurrency environments).

Wed, Oct 16, 9:51 AM · User-Ladsgroup, Patch-For-Review, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Wikimedia-production-error, Wikimedia-database-error, Wikidata
Marostegui added a comment to T234948: New Wikibase deadlocks on Wikidata wiki since 2019-10-08T00:00:02: Wikibase\Lib\Store\Sql\Terms\{closure} Deadlock found when trying to get lock; try restarting transaction.

That link doesn't work for me :(

Wed, Oct 16, 8:34 AM · User-Ladsgroup, Patch-For-Review, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Wikimedia-production-error, Wikimedia-database-error, Wikidata
Marostegui claimed T235599: Recompress special slaves across eqiad and codfw.
Wed, Oct 16, 7:14 AM · DBA
Marostegui created T235599: Recompress special slaves across eqiad and codfw.
Wed, Oct 16, 7:14 AM · DBA
Marostegui updated the task description for T234066: Schema change to rename user_newtalk indexes.
Wed, Oct 16, 5:57 AM · Blocked-on-schema-change, DBA
Marostegui updated the task description for T233135: Schema change for refactored actor and comment storage.
Wed, Oct 16, 5:56 AM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui added a comment to T233236: Move labtestwikitech database to clouddb2001-dev.

The big picture is: labtestwikitech needs a database. That database needs to be read/write.
labtestwikitech currently /has/ a database, in the m5 cluster, but it's not read/write from codfw and hence largely useless.
I don't care in the least where the database is or how it's managed -- I just need one. I expect to just make it myself on a a server that's ignored by the DBAs. Pointing a given wiki to an arbitrary database used to be simple (and I've done it several times) but since I last visited the wmf-config code things have become very different and abstract so I no longer no how to tell it "labtestwikitech uses database named 'foo' on server 'bar'." The answer to that last question is literally all I need here, although of course any other help is welcome.

Wed, Oct 16, 5:50 AM · Patch-For-Review, cloud-services-team (Kanban)
Marostegui moved T230885: Decommission db2066.codfw.wmnet from Ready for Decommission to pending onsite steps (codfw) on the decommission board.

Host ready for on-site steps + switch disablement

Wed, Oct 16, 5:29 AM · DC-Ops, ops-codfw, decommission, Operations
Marostegui reassigned T230885: Decommission db2066.codfw.wmnet from RobH to Papaul.
Wed, Oct 16, 5:28 AM · DC-Ops, ops-codfw, decommission, Operations
Marostegui added a comment to T234066: Schema change to rename user_newtalk indexes.

s2 eqiad progress

Wed, Oct 16, 5:17 AM · Blocked-on-schema-change, DBA
Marostegui added a comment to T234704: Remove ar_comment from sanitarium triggers.
Wed, Oct 16, 5:16 AM · DBA
Marostegui added a comment to T233135: Schema change for refactored actor and comment storage.

s2 eqiad progress

Wed, Oct 16, 5:16 AM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui updated the task description for T234704: Remove ar_comment from sanitarium triggers.
Wed, Oct 16, 5:15 AM · DBA
Marostegui updated the task description for T233625: Change PK and remove partitions from the logging table.
Wed, Oct 16, 5:05 AM · DBA

Tue, Oct 15

Marostegui added a comment to T233236: Move labtestwikitech database to clouddb2001-dev.

Hey @Andrew can you specify a bit more what's your plan with the new database on the codfw instance?
Right now the labstestwiki is located at eqiad and codfw (on a RO host as you know). Are you planning to have a writable version on codfw? What's the plan then with the existing one on codfw? And moreover, how are you going to replicate those changes to the eqiad one?
Are those two DBs going to become independent DBs? If so, how are you planning to have DC redundancy?

Tue, Oct 15, 3:42 PM · Patch-For-Review, cloud-services-team (Kanban)
Marostegui updated the task description for T233625: Change PK and remove partitions from the logging table.
Tue, Oct 15, 11:46 AM · DBA
Marostegui created T235475: Community Relations support needed for a read-only window for s6 (frwiki, jawiki, ruwiki).
Tue, Oct 15, 8:24 AM · CommRel-Specialists-Support (Oct-Dec-2019)
Marostegui updated the task description for T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).
Tue, Oct 15, 8:09 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a comment to T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).

db1126 and labsdb1009 are ok to proceed.
Note: db1069 has its power OFF as it is pending on-site decommissioning steps. DO NOT power it back on

Tue, Oct 15, 8:09 AM · DC-Ops, Operations, ops-eqiad
Marostegui added a parent task for T234800: Switchover s1 primary database master db1067 -> db1083 - 14th Nov 05:00 - 05:30 UTC: T210713: Drop change_tag.ct_tag column in production.
Tue, Oct 15, 7:52 AM · Operations, DBA
Marostegui added a subtask for T210713: Drop change_tag.ct_tag column in production: T234800: Switchover s1 primary database master db1067 -> db1083 - 14th Nov 05:00 - 05:30 UTC.
Tue, Oct 15, 7:52 AM · Blocked-on-schema-change, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui moved T234066: Schema change to rename user_newtalk indexes from Backlog to In progress on the Blocked-on-schema-change board.
Tue, Oct 15, 7:51 AM · Blocked-on-schema-change, DBA
Marostegui updated the task description for T220002: Decommission dbstore1001, dbstore2001, dbstore2002.
Tue, Oct 15, 7:45 AM · DC-Ops, decommission, Goal, DBA
Marostegui moved T230778: Decommission db2051.codfw.wmnet from Ready for Decommission to pending onsite steps (codfw) on the decommission board.

Host ready for onsite steps + switch disablement

Tue, Oct 15, 7:38 AM · DC-Ops, ops-codfw, decommission, Operations
Marostegui reassigned T230778: Decommission db2051.codfw.wmnet from RobH to Papaul.
Tue, Oct 15, 7:38 AM · DC-Ops, ops-codfw, decommission, Operations
Marostegui updated the task description for T226782: a1-eqiad pdu refresh (Tuesday 10/15 @11am UTC).
Tue, Oct 15, 7:05 AM · DC-Ops, Operations, ops-eqiad
Marostegui updated the task description for T235464: decommission db1070.eqiad.wmnet.
Tue, Oct 15, 6:38 AM · Operations, DBA
Marostegui updated the task description for T235464: decommission db1070.eqiad.wmnet.
Tue, Oct 15, 6:37 AM · Operations, DBA
Marostegui triaged T235464: decommission db1070.eqiad.wmnet as Normal priority.
Tue, Oct 15, 6:34 AM · Operations, DBA
Marostegui added a subtask for T217396: Decommission db1061-db1073: T235469: Switchover s6 primary database master db1061 -> db1131 - 19th Nov 05:00 - 05:30 UTC.
Tue, Oct 15, 6:33 AM · Operations, DBA
Marostegui added a parent task for T235469: Switchover s6 primary database master db1061 -> db1131 - 19th Nov 05:00 - 05:30 UTC: T217396: Decommission db1061-db1073.
Tue, Oct 15, 6:33 AM · DBA
Marostegui triaged T235469: Switchover s6 primary database master db1061 -> db1131 - 19th Nov 05:00 - 05:30 UTC as Normal priority.
Tue, Oct 15, 6:33 AM · DBA
Marostegui moved T235469: Switchover s6 primary database master db1061 -> db1131 - 19th Nov 05:00 - 05:30 UTC from Triage to Next on the DBA board.
Tue, Oct 15, 6:33 AM · DBA
Marostegui created T235469: Switchover s6 primary database master db1061 -> db1131 - 19th Nov 05:00 - 05:30 UTC.
Tue, Oct 15, 6:32 AM · DBA
Marostegui added a comment to T234948: New Wikibase deadlocks on Wikidata wiki since 2019-10-08T00:00:02: Wikibase\Lib\Store\Sql\Terms\{closure} Deadlock found when trying to get lock; try restarting transaction.

I have merged your change and disable MostLinked cronjob entry.
Thank you!

Tue, Oct 15, 5:58 AM · User-Ladsgroup, Patch-For-Review, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Wikimedia-production-error, Wikimedia-database-error, Wikidata
Marostegui placed T234770: Prepare and check storage layer for banwiki up for grabs.

db1124, db2094, labsdb1009, labsdb1010, labsdb1011, labsdb1012 are clean.
I have created the database on all the hosts:

banwiki_p
Tue, Oct 15, 5:44 AM · cloud-services-team (Kanban), Data-Services, DBA, Operations
Marostegui updated the task description for T217396: Decommission db1061-db1073.
Tue, Oct 15, 5:24 AM · Operations, DBA
Marostegui added a subtask for T217396: Decommission db1061-db1073: T235464: decommission db1070.eqiad.wmnet.
Tue, Oct 15, 5:24 AM · Operations, DBA
Marostegui added a parent task for T235464: decommission db1070.eqiad.wmnet: T217396: Decommission db1061-db1073.
Tue, Oct 15, 5:24 AM · Operations, DBA
Marostegui claimed T235464: decommission db1070.eqiad.wmnet.
Tue, Oct 15, 5:24 AM · Operations, DBA
Marostegui created T235464: decommission db1070.eqiad.wmnet.
Tue, Oct 15, 5:24 AM · Operations, DBA
Marostegui closed T234300: Switchover s5 primary database master db1070 -> db1100 - 15th Oct 05:00 - 05:30 UTC, a subtask of T217396: Decommission db1061-db1073, as Resolved.
Tue, Oct 15, 5:22 AM · Operations, DBA
Marostegui closed T234300: Switchover s5 primary database master db1070 -> db1100 - 15th Oct 05:00 - 05:30 UTC, a subtask of T186188: Failover DB masters in row D, as Resolved.
Tue, Oct 15, 5:22 AM · DBA
Marostegui closed T234300: Switchover s5 primary database master db1070 -> db1100 - 15th Oct 05:00 - 05:30 UTC as Resolved.
Tue, Oct 15, 5:22 AM · DBA
Marostegui updated the task description for T186188: Failover DB masters in row D.
Tue, Oct 15, 5:09 AM · DBA
Marostegui closed T234303: Community Relations support needed a several read-only window for s5, a subtask of T234300: Switchover s5 primary database master db1070 -> db1100 - 15th Oct 05:00 - 05:30 UTC, as Resolved.
Tue, Oct 15, 5:07 AM · DBA
Marostegui closed T234303: Community Relations support needed a several read-only window for s5 as Resolved.

This was done
read only start: 05:00:17
read only stop: 05:00:43

Tue, Oct 15, 5:07 AM · CommRel-Specialists-Support (Oct-Dec-2019)
Marostegui added a comment to T234300: Switchover s5 primary database master db1070 -> db1100 - 15th Oct 05:00 - 05:30 UTC.

This was done
read only start: 05:00:17
read only stop: 05:00:43

Tue, Oct 15, 5:07 AM · DBA
Marostegui updated the task description for T233625: Change PK and remove partitions from the logging table.
Tue, Oct 15, 4:37 AM · DBA
Marostegui updated the task description for T233625: Change PK and remove partitions from the logging table.
Tue, Oct 15, 4:36 AM · DBA
Marostegui reopened T227967: mr1-eqsin.oob IPv6 connectivity flapping as "Open".

This has been flapping overnight (times in UTC+2):

Tue, Oct 15, 4:24 AM · Operations, netops

Mon, Oct 14

Marostegui moved T234770: Prepare and check storage layer for banwiki from Blocked external/Not db team to In progress on the DBA board.
Mon, Oct 14, 1:21 PM · cloud-services-team (Kanban), Data-Services, DBA, Operations
Marostegui added a comment to T234770: Prepare and check storage layer for banwiki.

I have sanitized this wiki, but before adding the grants and creating the _p database I am running a check to make sure all the information is sanitized.
The triggers seem to be working as expected, as my user was sanitized correctly.

Mon, Oct 14, 1:19 PM · cloud-services-team (Kanban), Data-Services, DBA, Operations
Marostegui added a comment to T233273: labsdb1009 broken PSU.

Thanks John!
The alert recovered:

Mon, Oct 14, 1:15 PM · Operations, DC-Ops, ops-eqiad, DBA
Marostegui updated the task description for T233625: Change PK and remove partitions from the logging table.
Mon, Oct 14, 1:01 PM · DBA
Marostegui added a comment to T234948: New Wikibase deadlocks on Wikidata wiki since 2019-10-08T00:00:02: Wikibase\Lib\Store\Sql\Terms\{closure} Deadlock found when trying to get lock; try restarting transaction.

It can be that we can actually live, until migration is done, with Most Linked special pages on wikidata not being that up-to-date for about a month?
@Lydia_Pintscher let's discuss this on Monday.. I think the results on the first page of most linked pages probably do not change all that often, which I guess readers/editors/others are probably most interested in?

We're talking about https://www.wikidata.org/wiki/Special:MostLinkedPages not being updated for about a month during the migration and then updating regularly again? Yeah that should be fine.

@Lydia_Pintscher It's not just the special page, it has an API counter-part.
To make matter even more interesting, we can either disable none or all of these for wikidata: fewestrevisions, wantedpages, mostrevisions, mostlinked, deadendpages, ancientpages. If that's a no-no. I can make a patch in puppet that will be complex and big.

Mon, Oct 14, 12:57 PM · User-Ladsgroup, Patch-For-Review, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Wikimedia-production-error, Wikimedia-database-error, Wikidata
Marostegui claimed T234770: Prepare and check storage layer for banwiki.
Mon, Oct 14, 12:52 PM · cloud-services-team (Kanban), Data-Services, DBA, Operations
Marostegui added a comment to T233273: labsdb1009 broken PSU.

@Jclark-ctr you can proceed and change the PSU now. MySQL has been stopped.

Mon, Oct 14, 9:22 AM · Operations, DC-Ops, ops-eqiad, DBA
Marostegui added a comment to T234066: Schema change to rename user_newtalk indexes.

s7 eqiad progress

Mon, Oct 14, 9:13 AM · Blocked-on-schema-change, DBA
Marostegui added a comment to T233135: Schema change for refactored actor and comment storage.

s7 eqiad progress

Mon, Oct 14, 9:13 AM · Core Platform Team, Blocked-on-schema-change, DBA
Marostegui closed T231638: db1074 crashed: Broken BBU as Resolved.

db1125:3312 has been moved under db1074 with the following coordinates (GTID also enabled):

change master to master_host='db1074.eqiad.wmnet', master_user='repl', master_password='x' ,master_port=3306, MASTER_SSL=1,master_log_pos=388898652,master_log_file='db1074-bin.004543';
Mon, Oct 14, 8:53 AM · ops-eqiad, Operations, DBA
Marostegui updated the task description for T232446: Compress new Wikibase tables.
Mon, Oct 14, 8:00 AM · DBA
Marostegui created P9321 (An Untitled Masterwork).
Mon, Oct 14, 7:59 AM
Marostegui closed T235366: db2068 is misbehaving (but is depooled) as Resolved.

Resolving this as the host has been labelled as broken and sent to DC-Ops for decommissioning T235399

Mon, Oct 14, 7:17 AM · Operations, DBA
Marostegui moved T235399: Decommission db2068.codfw.wmnet from Backlog to pending onsite steps (codfw) on the decommission board.
Mon, Oct 14, 7:15 AM · Patch-For-Review, DC-Ops, ops-codfw, decommission, Operations
Marostegui updated the task description for T228258: Decommission db2043-db2069.
Mon, Oct 14, 7:14 AM · Operations, DBA
Marostegui added a project to T235399: Decommission db2068.codfw.wmnet: DC-Ops.

Host ready for DC-Ops steps

Mon, Oct 14, 7:09 AM · Patch-For-Review, DC-Ops, ops-codfw, decommission, Operations
Marostegui reassigned T235399: Decommission db2068.codfw.wmnet from Marostegui to Papaul.
Mon, Oct 14, 7:08 AM · Patch-For-Review, DC-Ops, ops-codfw, decommission, Operations
Marostegui updated subscribers of T235399: Decommission db2068.codfw.wmnet.

cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: db2068.codfw.wmnet

  • db2068.codfw.wmnet (FAIL)
    • Downtimed host on Icinga
    • Downtimed management interface on Icinga
    • Unable to connect to the host, wipe of bootloaders will not be performed: Cumin execution failed (exit_code=2)
    • Powered off
    • Set Netbox status to Decommissioning
    • Removed from DebMonitor
    • Removed from Puppet master and PuppetDB

ERROR: some step on some host failed, check the bolded items above

Mon, Oct 14, 7:05 AM · Patch-For-Review, DC-Ops, ops-codfw, decommission, Operations
Marostegui updated the task description for T235399: Decommission db2068.codfw.wmnet.
Mon, Oct 14, 6:03 AM · Patch-For-Review, DC-Ops, ops-codfw, decommission, Operations