Page MenuHomePhabricator
Feed Advanced Search

Jun 27 2019

Marostegui added a comment to T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.

Thank you!

Jun 27 2019, 1:31 PM · Wikidata, User-notice-archive, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, Cognate, Language-Team, Growth-Team, SRE, DBA
Marostegui added a comment to T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.

@Marostegui, which wikis are affected? Only English Wikipedia?
Do you need to display a banner too?

Jun 27 2019, 12:41 PM · Wikidata, User-notice-archive, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, Cognate, Language-Team, Growth-Team, SRE, DBA
Marostegui updated the task description for T220170: Address Database hardware infrastructure blockers on datacenter switchover & multi-dc deployment.
Jun 27 2019, 8:32 AM · Goal, DBA
Marostegui updated the task description for T217396: Decommission db1061-db1073.
Jun 27 2019, 7:29 AM · SRE, DBA
Marostegui reassigned T226689: decommission db1068 from Marostegui to RobH.

This host is ready for DCOPs to take over.

Jun 27 2019, 7:29 AM · DC-Ops, ops-eqiad, decommission-hardware, SRE
Marostegui updated the task description for T226689: decommission db1068.
Jun 27 2019, 7:28 AM · DC-Ops, ops-eqiad, decommission-hardware, SRE
Marostegui triaged T226689: decommission db1068 as Medium priority.
Jun 27 2019, 5:33 AM · DC-Ops, ops-eqiad, decommission-hardware, SRE
Marostegui updated the task description for T217396: Decommission db1061-db1073.
Jun 27 2019, 5:26 AM · SRE, DBA
Marostegui created T226689: decommission db1068.
Jun 27 2019, 5:26 AM · DC-Ops, ops-eqiad, decommission-hardware, SRE
Marostegui added a comment to T226685: HTTP 503 on zh.wikipedia.org.

We are having general connectivity issues

Jun 27 2019, 4:55 AM · SRE
Marostegui updated subscribers of T226685: HTTP 503 on zh.wikipedia.org.

@BBlack restarted varnish on that host. It should be ok now.

Jun 27 2019, 4:52 AM · SRE
Marostegui added a comment to T226685: HTTP 503 on zh.wikipedia.org.

We are looking into general connectivity issues at the moment

Jun 27 2019, 4:44 AM · SRE
Marostegui updated the task description for T222978: Compress and defragment tables on labsdb hosts.
Jun 27 2019, 4:35 AM · Data-Services, DBA

Jun 26 2019

Marostegui added a comment to T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.

@Ladsgroup I believe that last time it wasn't necessary, but I am not 100% sure

I can run it, it's fine. Just drop me a ping

Jun 26 2019, 4:02 PM · Wikidata, User-notice-archive, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, Cognate, Language-Team, Growth-Team, SRE, DBA
Marostegui added a comment to T71222: list=logevents slow for users with last log action long time ago.

I wanted to test this issue with 10.3 on db1114.
I copied logging page and user tables from dewiki from one of the hosts that have the weird plans and placed in this 10.3 server.

root@db1114.eqiad.wmnet[(none)]> select @@version;
+---------------------+
| @@version           |
+---------------------+
| 10.3.16-MariaDB-log |
+---------------------+
1 row in set (0.00 sec)
Jun 26 2019, 1:44 PM · mariadb-optimizer-bug, User-Marostegui, MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), DBA, Performance Issue, MediaWiki-Action-API
Marostegui added a comment to T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.

@Ladsgroup I believe that last time it wasn't necessary, but I am not 100% sure

Jun 26 2019, 1:15 PM · Wikidata, User-notice-archive, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, Cognate, Language-Team, Growth-Team, SRE, DBA
Marostegui awarded T206203: Implement database binary backups into the production infrastructure a Party Time token.
Jun 26 2019, 12:27 PM · Goal, DBA
Elitre awarded T224516: Database primary master failover on s4 (commonswiki) a Like token.
Jun 26 2019, 9:01 AM · User-notice-archive, User-Johan, Commons, MoveComms-Support (Apr-Jun-2019)
Marostegui edited projects for T196020: Consider adding ContentTranslation (CX) tables to wiki replicas, added: User-Marostegui; removed DBA.
Jun 26 2019, 7:24 AM · User-Marostegui, ContentTranslation, Data-Services
Marostegui updated subscribers of T220170: Address Database hardware infrastructure blockers on datacenter switchover & multi-dc deployment.

I had a chat with @mark and we are considering this Q4 goal done:

Jun 26 2019, 7:07 AM · Goal, DBA
Marostegui added a comment to T193224: Evaluate and decide the future of relational datastore at WMF after the upgrade of MariaDB 10.1 is finished.

@jcrespo you ok if I copy dewiki.logging into db1114

Sure, if you do it in its own separate schema.

Jun 26 2019, 6:52 AM · MediaWiki-General, SRE, DBA
Marostegui added a comment to T193224: Evaluate and decide the future of relational datastore at WMF after the upgrade of MariaDB 10.1 is finished.

@jcrespo you ok if I copy dewiki.logging into db1114? I would like to see the behaviour of 10.3 optimizer in regards to the query planner bug observed at T71222: list=logevents slow for users with last log action long time ago

Jun 26 2019, 6:24 AM · MediaWiki-General, SRE, DBA
Marostegui removed a project from T226546: babel database doesn't support language codes longer than 10 characters (e.g. de-x-formal): DBA.

Is it only on the wikis you pasted, or could it be on more?

Nope, every wiki should have the table (the extension is enabled everywhere but loginwiki and votewiki)

Jun 26 2019, 6:15 AM · MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), Schema-change, affects-translatewiki.net, MediaWiki-extensions-Babel
Marostegui moved T226569: Degraded RAID on db1072 from Triage to In progress on the DBA board.
Jun 26 2019, 6:11 AM · DBA, ops-eqiad, SRE
Marostegui closed T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table as Resolved.

All done

Jun 26 2019, 6:09 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks
Marostegui closed T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table, a subtask of T54921: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking), as Resolved.
Jun 26 2019, 6:09 AM · Epic, DBA, Tracking-Neverending
Marostegui updated the task description for T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.
Jun 26 2019, 6:09 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks
Marostegui added a comment to T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.

Deletion process for s8 (wikidata). The table is 6GB there.
Not written since 29th March:

-rw-rw---- 1 mysql mysql 6.3G Mar 29 05:58 wikimedia_editor_tasks_entity_description_exists.ibd
Jun 26 2019, 5:56 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks
Marostegui updated the task description for T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.
Jun 26 2019, 5:48 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks
Marostegui updated the task description for T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.
Jun 26 2019, 5:47 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks
Marostegui updated the task description for T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.
Jun 26 2019, 5:47 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks
Marostegui added a comment to T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.

I have dropped this table from s3 (testwikidatawiki) which wasn't written since 27th March:

-rw-rw---- 1 mysql mysql 384K Mar 27 22:36 wikimedia_editor_tasks_entity_description_exists.ibd
Jun 26 2019, 5:46 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks
Marostegui added a parent task for T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC: T220170: Address Database hardware infrastructure blockers on datacenter switchover & multi-dc deployment.
Jun 26 2019, 5:38 AM · Wikidata, User-notice-archive, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, Cognate, Language-Team, Growth-Team, SRE, DBA
Marostegui added a subtask for T220170: Address Database hardware infrastructure blockers on datacenter switchover & multi-dc deployment: T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.
Jun 26 2019, 5:37 AM · Goal, DBA
Marostegui closed T222682: Productionize db11[26-38] as Resolved.

All these hosts are now provisioned

Jun 26 2019, 5:30 AM · Patch-For-Review, Goal, DBA
Marostegui closed T222682: Productionize db11[26-38], a subtask of T211613: rack/setup/install db11[26-38].eqiad.wmnet, as Resolved.
Jun 26 2019, 5:30 AM · Goal, DBA, ops-eqiad, User-Marostegui, SRE
Marostegui updated the task description for T222682: Productionize db11[26-38].
Jun 26 2019, 5:30 AM · Patch-For-Review, Goal, DBA
Marostegui added a comment to T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage.

Just a quick question: "this would fit a generalized parser cache mechanism" meaning it would fit into the existing parsercache mechanism (and infrastructure) or is that still to be defined?
Thanks!

Jun 26 2019, 4:58 AM · Platform Engineering Roadmap Decision Making, User-Addshore, Wikibase-Quality-Constraints, User-mobrovac, [DEPRECATED] wdwb-tech, TechCom-RFC, Wikibase-Quality, Wikidata

Jun 25 2019

Marostegui assigned T226569: Degraded RAID on db1072 to Cmjohnson.

Can we get this disk replaced - this is m3 master.
Thanks!

Jun 25 2019, 7:52 PM · DBA, ops-eqiad, SRE
Marostegui added a comment to T193224: Evaluate and decide the future of relational datastore at WMF after the upgrade of MariaDB 10.1 is finished.

db1114: Version 10.3.16-MariaDB-log, Uptime 937s, read_only: True, 122.17 QPS, connection latency: 0.003587s, query latency: 0.000728s

Jun 25 2019, 5:08 PM · MediaWiki-General, SRE, DBA
Marostegui closed T226519: Degraded RAID on db1077 as Declined.

This is not the RAID, this is the BBU which is broken - T225391#5261662 but the host is out of warranty

Jun 25 2019, 3:28 PM · ops-eqiad, SRE
Marostegui updated the task description for T222682: Productionize db11[26-38].
Jun 25 2019, 2:26 PM · Patch-For-Review, Goal, DBA
Marostegui closed T210725: Replace parsercache keys to something more meaningful on db-XXXX.php as Resolved.
Jun 25 2019, 1:21 PM · MediaWiki-libs-BagOStuff, Performance-Team (Radar), DBA, User-Marostegui
Marostegui closed T210725: Replace parsercache keys to something more meaningful on db-XXXX.php, a subtask of T133523: Decide how to improve parsercache replication, sharding and HA, as Resolved.
Jun 25 2019, 1:21 PM · SRE-Sprint-Week-Sustainability-March2023, MW-1.39-notes (1.39.0-wmf.22; 2022-07-25), Patch-For-Review, Epic, Sustainability (Incident Followup), DBA
Marostegui created P8654 (An Untitled Masterwork).
Jun 25 2019, 12:50 PM
Restricted Application added a project to T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC: Product-Infrastructure-Team-Backlog-Deprecated.

Thanks a lot @Tgr I will tag those (better to tag them and they can remove themselves if it no longer applies) and update documentation accordingly.
Thanks again, very useful!

Jun 25 2019, 10:07 AM · Wikidata, User-notice-archive, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, Cognate, Language-Team, Growth-Team, SRE, DBA
Marostegui added a comment to T210725: Replace parsercache keys to something more meaningful on db-XXXX.php.

I have finished deploying the last key change. I did it in small batches during a few hours: https://grafana.wikimedia.org/render/d-solo/000000106/parser-cache?panelId=1&orgId=1&from=1561367808736&to=1561454208737&refresh=10s&var-contentModel=wikitext&width=1000&height=500&tz=Europe%2FMadrid

pc.png (500×1 px, 71 KB)

Jun 25 2019, 9:19 AM · MediaWiki-libs-BagOStuff, Performance-Team (Radar), DBA, User-Marostegui
Marostegui added a comment to T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.

Thank you! :)

Jun 25 2019, 8:45 AM · Wikidata, User-notice-archive, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, Cognate, Language-Team, Growth-Team, SRE, DBA
Marostegui added a comment to T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.

Thanks @Ladsgroup!
We have always talked about documenting who and which teams to tag when planning x1 switchovers, so I have created this https://wikitech.wikimedia.org/wiki/MariaDB#Special_section:_x1_master_switchover (based on this task and the previous ones).

Jun 25 2019, 8:07 AM · Wikidata, User-notice-archive, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, Cognate, Language-Team, Growth-Team, SRE, DBA
Marostegui closed T211613: rack/setup/install db11[26-38].eqiad.wmnet as Resolved.
Jun 25 2019, 7:40 AM · Goal, DBA, ops-eqiad, User-Marostegui, SRE
Marostegui closed T211613: rack/setup/install db11[26-38].eqiad.wmnet, a subtask of T217396: Decommission db1061-db1073, as Resolved.
Jun 25 2019, 7:40 AM · SRE, DBA
Marostegui closed T211613: rack/setup/install db11[26-38].eqiad.wmnet, a subtask of T220170: Address Database hardware infrastructure blockers on datacenter switchover & multi-dc deployment, as Resolved.
Jun 25 2019, 7:40 AM · Goal, DBA
Marostegui changed the status of T211613: rack/setup/install db11[26-38].eqiad.wmnet from Stalled to Open.

Finally db1133 has been installed correctly!
Thanks @Cmjohnson for getting it fixed!

root@db1133:~# megacli -LdPdInfo -a0 ; megacli -LdPdInfo -a0 | grep state ; megacli -LdPdInfo -a0 | grep -i Raw ;  megacli -LdPdInfo -a0 | grep state | wc -l ; free -g
Jun 25 2019, 7:38 AM · Goal, DBA, ops-eqiad, User-Marostegui, SRE
Marostegui changed the status of T211613: rack/setup/install db11[26-38].eqiad.wmnet, a subtask of T217396: Decommission db1061-db1073, from Stalled to Open.
Jun 25 2019, 7:38 AM · SRE, DBA
Marostegui changed the status of T211613: rack/setup/install db11[26-38].eqiad.wmnet, a subtask of T220170: Address Database hardware infrastructure blockers on datacenter switchover & multi-dc deployment, from Stalled to Open.
Jun 25 2019, 7:38 AM · Goal, DBA
Marostegui closed T222731: Storage problems with new host db1133 as Resolved.

I have re-imaged the host after Chris did it yesterday and everything looks good: RAID, memory, CPUS...

root@db1133:~# megacli -LdPdInfo -a0
Jun 25 2019, 5:32 AM · ops-eqiad, SRE
Marostegui closed T222731: Storage problems with new host db1133, a subtask of T211613: rack/setup/install db11[26-38].eqiad.wmnet, as Resolved.
Jun 25 2019, 5:32 AM · Goal, DBA, ops-eqiad, User-Marostegui, SRE
Marostegui updated the task description for T222682: Productionize db11[26-38].
Jun 25 2019, 5:13 AM · Patch-For-Review, Goal, DBA

Jun 24 2019

Marostegui updated the task description for T208323: Predictive failures on disk S.M.A.R.T. status.
Jun 24 2019, 5:58 PM · SRE, DBA
Marostegui closed T225889: Degraded RAID on db2043 as Resolved.

The RAID finished correctly, although the disk came with predictive failure.
I am going to close this task as resolved as the ops-monitoring will open a new once once it has failed again:

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 600 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 600 GB, Predictive Failure)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 600 GB, OK)
physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SAS, 600 GB, OK)
physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 600 GB, OK)
physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SAS, 600 GB, OK)
physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SAS, 600 GB, OK)
physicaldrive 1I:1:9 (port 1I:box 1:bay 9, SAS, 600 GB, OK)
physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SAS, 600 GB, OK)
physicaldrive 1I:1:11 (port 1I:box 1:bay 11, SAS, 600 GB, OK)
physicaldrive 1I:1:12 (port 1I:box 1:bay 12, SAS, 600 GB, OK)
Jun 24 2019, 5:57 PM · DBA, SRE, ops-codfw
Marostegui added a comment to T206203: Implement database binary backups into the production infrastructure.

\o/

Jun 24 2019, 4:44 PM · Goal, DBA
Marostegui reassigned T225889: Degraded RAID on db2043 from Marostegui to Papaul.

It failed already :(

physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 600 GB, Failed)
Jun 24 2019, 2:35 PM · DBA, SRE, ops-codfw
Marostegui updated the task description for T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.
Jun 24 2019, 2:06 PM · Wikidata, User-notice-archive, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, Cognate, Language-Team, Growth-Team, SRE, DBA
Marostegui moved T225988: decommission db2039 from Backlog to Ready for Decommission on the decommission-hardware board.
Jun 24 2019, 1:35 PM · Patch-For-Review, DC-Ops, ops-codfw, decommission-hardware, SRE
Marostegui updated the task description for T208323: Predictive failures on disk S.M.A.R.T. status.
Jun 24 2019, 1:12 PM · SRE, DBA
Marostegui added a comment to T222050: db1107 (eventlogging db master) possibly memory issues.

@Cmjohnson as per the error @jcrespo pasted above is that enough to get Dell to send a new DIMM you think?

Jun 24 2019, 8:41 AM · Analytics, SRE, ops-eqiad, MediaWiki-extensions-EventLogging, DBA
Marostegui triaged T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC as Medium priority.
Jun 24 2019, 7:59 AM · Wikidata, User-notice-archive, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, Cognate, Language-Team, Growth-Team, SRE, DBA
Marostegui created T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.
Jun 24 2019, 7:59 AM · Wikidata, User-notice-archive, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, Cognate, Language-Team, Growth-Team, SRE, DBA
Marostegui updated the task description for T222682: Productionize db11[26-38].
Jun 24 2019, 6:37 AM · Patch-For-Review, Goal, DBA
Marostegui created P8644 (An Untitled Masterwork).
Jun 24 2019, 6:02 AM
Marostegui updated the task description for T222682: Productionize db11[26-38].
Jun 24 2019, 5:23 AM · Patch-For-Review, Goal, DBA
Marostegui updated the task description for T222682: Productionize db11[26-38].
Jun 24 2019, 5:23 AM · Patch-For-Review, Goal, DBA
Marostegui added a comment to T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.

Same has been done on testwikidatawiki on s3:

root@db1123.eqiad.wmnet[testwikidatawiki]> rename table wikimedia_editor_tasks_entity_description_exists to T226326_wikimedia_editor_tasks_entity_description_exists;
Query OK, 0 rows affected (0.01 sec)
Jun 24 2019, 4:59 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks
Marostegui claimed T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.

So for now I have renamed the table on db1092 and will leave it like that for a couple of days before dropping it for good, just to see if there are some unexpected issues:

root@db1092.eqiad.wmnet[wikidatawiki]> rename table wikimedia_editor_tasks_entity_description_exists to T226326_wikimedia_editor_tasks_entity_description_exists;
Query OK, 0 rows affected (0.01 sec)
Jun 24 2019, 4:56 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks
Marostegui added a parent task for T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table: T54921: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking).
Jun 24 2019, 4:53 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks
Marostegui added a subtask for T54921: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking): T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.
Jun 24 2019, 4:53 AM · Epic, DBA, Tracking-Neverending
Marostegui removed a project from T225169: [4 hours] Investigate whether it's efficient to order by tag value (DBA input requested): DBA.
Jun 24 2019, 4:49 AM · Community-Tech (Kanban (Q1 2019-20)), Spike, Growth-Team, PageTriage
Marostegui added a comment to T226337: SpecialConfirmEmail causes "MWException: CAS update failed on user_touched" from User.php.

Some more details:

Jun 24 2019, 4:49 AM · MW-1.36-notes (1.36.0-wmf.9; 2020-09-15), Platform Team Workboards (Clinic Duty Team), Performance-Team (Radar), Sustainability, Wikimedia-production-error, MediaWiki-Core-Preferences

Jun 23 2019

Marostegui added a comment to T226337: SpecialConfirmEmail causes "MWException: CAS update failed on user_touched" from User.php.

Thanks for creating the task :)

Jun 23 2019, 8:15 PM · MW-1.36-notes (1.36.0-wmf.9; 2020-09-15), Platform Team Workboards (Clinic Duty Team), Performance-Team (Radar), Sustainability, Wikimedia-production-error, MediaWiki-Core-Preferences
Marostegui added a comment to T226297: ERROR 2013 (HY000): Lost connection to MySQL server during query on replicas.

Yeah, essentially we have 3 hosts. Usually only one of them is dedicated to the long queries (analytics) and 2 of the to the web service (fast queries), but due to the maintenance (T222978) we have now 1 host serving analytics which also serves a portion of web, and hence it is more loaded than normal.
This is the change: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/518029/

Jun 23 2019, 8:39 AM · Data-Services
Marostegui updated the task description for T208323: Predictive failures on disk S.M.A.R.T. status.
Jun 23 2019, 5:43 AM · SRE, DBA
Marostegui added a comment to T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.

Can this go anytime then?

Jun 23 2019, 5:23 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Product-Infrastructure-Team-Backlog-Deprecated, WikimediaEditorTasks

Jun 22 2019

Marostegui added a comment to T226297: ERROR 2013 (HY000): Lost connection to MySQL server during query on replicas.

It is because labsdb1010 is serving (temporarily while we do some maintenance on 1011) analytics but still has the query killer set to 300 seconds instead of 14400 (14400 is the one we use for the long query hosts). I have changed it and it is now set to 14400 so it should not be killing those small queries anymore.

Jun 22 2019, 11:08 AM · Data-Services

Jun 21 2019

Marostegui updated the task description for T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Jun 21 2019, 9:30 AM · Patch-For-Review, DBA
Marostegui closed T225704: eqiad: rack/setup/install (4) dbproxy systems. as Resolved.

All hosts installed

Jun 21 2019, 9:30 AM · Patch-For-Review, SRE, DBA
Marostegui closed T225704: eqiad: rack/setup/install (4) dbproxy systems., a subtask of T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4], as Resolved.
Jun 21 2019, 9:30 AM · Patch-For-Review, DBA
Marostegui updated the task description for T225704: eqiad: rack/setup/install (4) dbproxy systems..
Jun 21 2019, 9:30 AM · Patch-For-Review, SRE, DBA
Marostegui updated the task description for T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Jun 21 2019, 9:14 AM · Patch-For-Review, DBA
Marostegui updated the task description for T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Jun 21 2019, 9:14 AM · Patch-For-Review, DBA
Marostegui updated the task description for T225704: eqiad: rack/setup/install (4) dbproxy systems..
Jun 21 2019, 9:13 AM · Patch-For-Review, SRE, DBA
Marostegui moved T225169: [4 hours] Investigate whether it's efficient to order by tag value (DBA input requested) from Triage to Done on the DBA board.

I have been checking this query on enwiki and it doesn't seem to be too bad:

root@db1089.eqiad.wmnet[enwiki]> FLUSH STATUS; pager cat > /dev/null; SELECT page_namespace, page_title, ptrpt_value FROM pagetriage_page_tags JOIN page ON ptrpt_page_id = page_id WHERE ptrpt_tag_id = 2 ORDER BY CAST(ptrpt_value AS SIGNED) DESC LIMIT 10; ; nopager; SHOW STATUS like 'Hand%';
Query OK, 0 rows affected (0.00 sec)
Jun 21 2019, 8:29 AM · Community-Tech (Kanban (Q1 2019-20)), Spike, Growth-Team, PageTriage
Marostegui claimed T225704: eqiad: rack/setup/install (4) dbproxy systems..

While debugging we Arzhel we have noticed that the DNS entries for dbproxy1018 and dbproxy1019 didn't belong to the cloud network, I have changed them and I will to install again.

Jun 21 2019, 8:08 AM · Patch-For-Review, SRE, DBA
Marostegui updated the task description for T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Jun 21 2019, 7:30 AM · Patch-For-Review, DBA
Marostegui reassigned T225704: eqiad: rack/setup/install (4) dbproxy systems. from Marostegui to Cmjohnson.

@Cmjohnson @ayounsi is there anything special with dbproxy1018 and dbproxy1019 VLAN's and PXE? None of the seems to be booting up from PXE, despite that the MACs I added on tftpboot are the same ones that the IDRAC show it is trying to boot up from:
dbproxy1018 4C:D9:8F:6C:A5:9E https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/518197/1/modules/install_server/files/dhcpd/linux-host-entries.ttyS1-115200
dbproxy1019 4C:D9:8F:6C:9F:2F https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/518203/2/modules/install_server/files/dhcpd/linux-host-entries.ttyS1-115200

Jun 21 2019, 7:30 AM · Patch-For-Review, SRE, DBA
Marostegui updated the task description for T225704: eqiad: rack/setup/install (4) dbproxy systems..
Jun 21 2019, 7:29 AM · Patch-For-Review, SRE, DBA
Marostegui updated the task description for T225704: eqiad: rack/setup/install (4) dbproxy systems..
Jun 21 2019, 6:48 AM · Patch-For-Review, SRE, DBA
Marostegui closed T225884: db2084 temporary correctable hardware errors as Resolved.

And it finally cleared up

23:38:30 <+icinga-wm> RECOVERY - EDAC syslog messages on db2084 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=db2084&var-datasource=codfw+prometheus/ops
Jun 21 2019, 4:57 AM · SRE, DBA

Jun 20 2019

Marostegui added a comment to T225889: Degraded RAID on db2043.

And the disk failed again

Jun 20 2019, 4:54 PM · DBA, SRE, ops-codfw
Marostegui closed T226194: Degraded RAID on db2043 as Declined.

Duplicate of T225889

Jun 20 2019, 3:59 PM · SRE, ops-codfw
Marostegui added a comment to T225704: eqiad: rack/setup/install (4) dbproxy systems..

@RobH if you add the production DNS entries, I can take care of the installations myself

Jun 20 2019, 3:27 PM · Patch-For-Review, SRE, DBA