Page MenuHomePhabricator

Marostegui (Manuel Aróstegui)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Sep 1 2016, 6:48 AM (146 w, 6 d)
Availability
Available
IRC Nick
marostegui
LDAP User
Marostegui
MediaWiki User
MArostegui (WMF) [ Global Accounts ]

TZ: UTC +1/+2

Recent Activity

Today

Marostegui added a comment to T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.

@Ladsgroup I believe that last time it wasn't necessary, but I am not 100% sure

I can run it, it's fine. Just drop me a ping

Wed, Jun 26, 4:02 PM · Patch-For-Review, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, User-notice, Cognate, Language-Team, Growth-Team, Operations, DBA
Marostegui added a comment to T71222: list=logevents slow for users with last log action long time ago.

I wanted to test this issue with 10.3 on db1114.
I copied logging page and user tables from dewiki from one of the hosts that have the weird plans and placed in this 10.3 server.

root@db1114.eqiad.wmnet[(none)]> select @@version;
+---------------------+
| @@version           |
+---------------------+
| 10.3.16-MariaDB-log |
+---------------------+
1 row in set (0.00 sec)
Wed, Jun 26, 1:44 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), DBA, Performance, MediaWiki-API
Marostegui added a comment to T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.

@Ladsgroup I believe that last time it wasn't necessary, but I am not 100% sure

Wed, Jun 26, 1:15 PM · Patch-For-Review, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, User-notice, Cognate, Language-Team, Growth-Team, Operations, DBA
Marostegui awarded T206203: Implement database binary backups into the production infrastructure a Party Time token.
Wed, Jun 26, 12:27 PM · Patch-For-Review, Goal, DBA
Elitre awarded T224516: Database primary master failover on s4 (commonswiki) a Like token.
Wed, Jun 26, 9:01 AM · User-Johan, Commons, CommRel-Specialists-Support (Apr-Jun-2019), User-notice
Marostegui edited projects for T196020: Consider adding ContentTranslation (CX) tables to wiki replicas, added: User-Marostegui; removed DBA.
Wed, Jun 26, 7:24 AM · User-Marostegui, ContentTranslation, Data-Services
Marostegui updated subscribers of T220170: Address Database infrastructure blockers on datacenter switchover & multi-dc deployment.

I had a chat with @mark and we are considering this Q4 goal done:

Wed, Jun 26, 7:07 AM · Goal, DBA
Marostegui added a comment to T193224: Evaluate and decide the future of relational datastore at WMF after the upgrade of MariaDB 10.1 is finished.

@jcrespo you ok if I copy dewiki.logging into db1114

Sure, if you do it in its own separate schema.

Wed, Jun 26, 6:52 AM · Patch-For-Review, MediaWiki-Database, Operations, DBA
Marostegui added a comment to T193224: Evaluate and decide the future of relational datastore at WMF after the upgrade of MariaDB 10.1 is finished.

@jcrespo you ok if I copy dewiki.logging into db1114? I would like to see the behaviour of 10.3 optimizer in regards to the query planner bug observed at T71222: list=logevents slow for users with last log action long time ago

Wed, Jun 26, 6:24 AM · Patch-For-Review, MediaWiki-Database, Operations, DBA
Marostegui removed a project from T226546: babel database doesn't support language codes longer than 10 characters (e.g. de-x-formal): DBA.

Is it only on the wikis you pasted, or could it be on more?

Nope, every wiki should have the table (the extension is enabled everywhere but loginwiki and votewiki)

Wed, Jun 26, 6:15 AM · translatewiki.net, MediaWiki-extensions-Babel
Marostegui moved T226569: Degraded RAID on db1072 from Triage to In progress on the DBA board.
Wed, Jun 26, 6:11 AM · DBA, ops-eqiad, Operations
Marostegui closed T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table as Resolved.

All done

Wed, Jun 26, 6:09 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks
Marostegui closed T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table, a subtask of T54921: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking), as Resolved.
Wed, Jun 26, 6:09 AM · Epic, DBA, Tracking-Neverending
Marostegui updated the task description for T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.
Wed, Jun 26, 6:09 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks
Marostegui added a comment to T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.

Deletion process for s8 (wikidata). The table is 6GB there.
Not written since 29th March:

-rw-rw---- 1 mysql mysql 6.3G Mar 29 05:58 wikimedia_editor_tasks_entity_description_exists.ibd
Wed, Jun 26, 5:56 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks
Marostegui updated the task description for T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.
Wed, Jun 26, 5:48 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks
Marostegui updated the task description for T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.
Wed, Jun 26, 5:47 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks
Marostegui updated the task description for T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.
Wed, Jun 26, 5:47 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks
Marostegui added a comment to T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.

I have dropped this table from s3 (testwikidatawiki) which wasn't written since 27th March:

-rw-rw---- 1 mysql mysql 384K Mar 27 22:36 wikimedia_editor_tasks_entity_description_exists.ibd
Wed, Jun 26, 5:46 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks
Marostegui added a parent task for T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC: T220170: Address Database infrastructure blockers on datacenter switchover & multi-dc deployment.
Wed, Jun 26, 5:38 AM · Patch-For-Review, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, User-notice, Cognate, Language-Team, Growth-Team, Operations, DBA
Marostegui added a subtask for T220170: Address Database infrastructure blockers on datacenter switchover & multi-dc deployment: T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.
Wed, Jun 26, 5:37 AM · Goal, DBA
Marostegui closed T222682: Productionize db11[26-38] as Resolved.

All these hosts are now provisioned

Wed, Jun 26, 5:30 AM · Patch-For-Review, Goal, DBA
Marostegui closed T222682: Productionize db11[26-38], a subtask of T211613: rack/setup/install db11[26-38].eqiad.wmnet, as Resolved.
Wed, Jun 26, 5:30 AM · Goal, DBA, ops-eqiad, User-Marostegui, Operations
Marostegui updated the task description for T222682: Productionize db11[26-38].
Wed, Jun 26, 5:30 AM · Patch-For-Review, Goal, DBA
Marostegui added a comment to T214362: RFC: Store WikibaseQualityConstraint check data in persistent storage.

Just a quick question: "this would fit a generalized parser cache mechanism" meaning it would fit into the existing parsercache mechanism (and infrastructure) or is that still to be defined?
Thanks!

Wed, Jun 26, 4:58 AM · Core Platform Team Backlog (Designing), Services (designing), User-mobrovac, wikidata-tech-focus, TechCom-RFC, Wikibase-Quality, Wikidata

Yesterday

Marostegui assigned T226569: Degraded RAID on db1072 to Cmjohnson.

Can we get this disk replaced - this is m3 master.
Thanks!

Tue, Jun 25, 7:52 PM · DBA, ops-eqiad, Operations
Marostegui added a comment to T193224: Evaluate and decide the future of relational datastore at WMF after the upgrade of MariaDB 10.1 is finished.

db1114: Version 10.3.16-MariaDB-log, Uptime 937s, read_only: True, 122.17 QPS, connection latency: 0.003587s, query latency: 0.000728s

Tue, Jun 25, 5:08 PM · Patch-For-Review, MediaWiki-Database, Operations, DBA
Marostegui closed T226519: Degraded RAID on db1077 as Declined.

This is not the RAID, this is the BBU which is broken - T225391#5261662 but the host is out of warranty

Tue, Jun 25, 3:28 PM · ops-eqiad, Operations
Marostegui updated the task description for T222682: Productionize db11[26-38].
Tue, Jun 25, 2:26 PM · Patch-For-Review, Goal, DBA
Marostegui closed T210725: Replace parsercache keys to something more meaningful on db-XXXX.php as Resolved.
Tue, Jun 25, 1:21 PM · MediaWiki-Cache, Performance-Team (Radar), DBA, User-Marostegui
Marostegui closed T210725: Replace parsercache keys to something more meaningful on db-XXXX.php, a subtask of T133523: [RFC] improve parsercache replication and sharding handling, as Resolved.
Tue, Jun 25, 1:21 PM · Patch-For-Review, Operations, codfw-rollout, DBA
Marostegui created P8654 (An Untitled Masterwork).
Tue, Jun 25, 12:50 PM
Restricted Application added a project to T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC: Reading-Infrastructure-Team-Backlog.

Thanks a lot @Tgr I will tag those (better to tag them and they can remove themselves if it no longer applies) and update documentation accordingly.
Thanks again, very useful!

Tue, Jun 25, 10:07 AM · Patch-For-Review, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, User-notice, Cognate, Language-Team, Growth-Team, Operations, DBA
Marostegui added a comment to T210725: Replace parsercache keys to something more meaningful on db-XXXX.php.

I have finished deploying the last key change. I did it in small batches during a few hours: https://grafana.wikimedia.org/render/d-solo/000000106/parser-cache?panelId=1&orgId=1&from=1561367808736&to=1561454208737&refresh=10s&var-contentModel=wikitext&width=1000&height=500&tz=Europe%2FMadrid

Tue, Jun 25, 9:19 AM · MediaWiki-Cache, Performance-Team (Radar), DBA, User-Marostegui
Marostegui added a comment to T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.

Thank you! :)

Tue, Jun 25, 8:45 AM · Patch-For-Review, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, User-notice, Cognate, Language-Team, Growth-Team, Operations, DBA
Marostegui added a comment to T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.

Thanks @Ladsgroup!
We have always talked about documenting who and which teams to tag when planning x1 switchovers, so I have created this https://wikitech.wikimedia.org/wiki/MariaDB#Special_section:_x1_master_switchover (based on this task and the previous ones).

Tue, Jun 25, 8:07 AM · Patch-For-Review, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, User-notice, Cognate, Language-Team, Growth-Team, Operations, DBA
Marostegui closed T211613: rack/setup/install db11[26-38].eqiad.wmnet as Resolved.
Tue, Jun 25, 7:40 AM · Goal, DBA, ops-eqiad, User-Marostegui, Operations
Marostegui closed T211613: rack/setup/install db11[26-38].eqiad.wmnet, a subtask of T217396: Decommission db1061-db1073, as Resolved.
Tue, Jun 25, 7:40 AM · Operations, DBA
Marostegui closed T211613: rack/setup/install db11[26-38].eqiad.wmnet, a subtask of T220170: Address Database infrastructure blockers on datacenter switchover & multi-dc deployment, as Resolved.
Tue, Jun 25, 7:40 AM · Goal, DBA
Marostegui changed the status of T211613: rack/setup/install db11[26-38].eqiad.wmnet from Stalled to Open.

Finally db1133 has been installed correctly!
Thanks @Cmjohnson for getting it fixed!

root@db1133:~# megacli -LdPdInfo -a0 ; megacli -LdPdInfo -a0 | grep state ; megacli -LdPdInfo -a0 | grep -i Raw ;  megacli -LdPdInfo -a0 | grep state | wc -l ; free -g
Tue, Jun 25, 7:38 AM · Goal, DBA, ops-eqiad, User-Marostegui, Operations
Marostegui changed the status of T211613: rack/setup/install db11[26-38].eqiad.wmnet, a subtask of T217396: Decommission db1061-db1073, from Stalled to Open.
Tue, Jun 25, 7:38 AM · Operations, DBA
Marostegui changed the status of T211613: rack/setup/install db11[26-38].eqiad.wmnet, a subtask of T220170: Address Database infrastructure blockers on datacenter switchover & multi-dc deployment, from Stalled to Open.
Tue, Jun 25, 7:38 AM · Goal, DBA
Marostegui closed T222731: Storage problems with new host db1133 as Resolved.

I have re-imaged the host after Chris did it yesterday and everything looks good: RAID, memory, CPUS...

root@db1133:~# megacli -LdPdInfo -a0
Tue, Jun 25, 5:32 AM · ops-eqiad, Operations
Marostegui closed T222731: Storage problems with new host db1133, a subtask of T211613: rack/setup/install db11[26-38].eqiad.wmnet, as Resolved.
Tue, Jun 25, 5:32 AM · Goal, DBA, ops-eqiad, User-Marostegui, Operations
Marostegui updated the task description for T222682: Productionize db11[26-38].
Tue, Jun 25, 5:13 AM · Patch-For-Review, Goal, DBA

Mon, Jun 24

Marostegui updated the task description for T208323: Predictive failures on disk S.M.A.R.T. status.
Mon, Jun 24, 5:58 PM · Operations, DBA
Marostegui closed T225889: Degraded RAID on db2043 as Resolved.

The RAID finished correctly, although the disk came with predictive failure.
I am going to close this task as resolved as the ops-monitoring will open a new once once it has failed again:

physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 600 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 600 GB, OK)
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 600 GB, Predictive Failure)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 600 GB, OK)
physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SAS, 600 GB, OK)
physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 600 GB, OK)
physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SAS, 600 GB, OK)
physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SAS, 600 GB, OK)
physicaldrive 1I:1:9 (port 1I:box 1:bay 9, SAS, 600 GB, OK)
physicaldrive 1I:1:10 (port 1I:box 1:bay 10, SAS, 600 GB, OK)
physicaldrive 1I:1:11 (port 1I:box 1:bay 11, SAS, 600 GB, OK)
physicaldrive 1I:1:12 (port 1I:box 1:bay 12, SAS, 600 GB, OK)
Mon, Jun 24, 5:57 PM · DBA, Operations, ops-codfw
Marostegui added a comment to T206203: Implement database binary backups into the production infrastructure.

\o/

Mon, Jun 24, 4:44 PM · Patch-For-Review, Goal, DBA
Marostegui reassigned T225889: Degraded RAID on db2043 from Marostegui to Papaul.

It failed already :(

physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 600 GB, Failed)
Mon, Jun 24, 2:35 PM · DBA, Operations, ops-codfw
Marostegui updated the task description for T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.
Mon, Jun 24, 2:06 PM · Patch-For-Review, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, User-notice, Cognate, Language-Team, Growth-Team, Operations, DBA
Marostegui moved T225988: decommission db2039 from Backlog to Ready for Decommission on the decommission board.
Mon, Jun 24, 1:35 PM · DC-Ops, ops-codfw, decommission, Operations
Marostegui updated the task description for T208323: Predictive failures on disk S.M.A.R.T. status.
Mon, Jun 24, 1:12 PM · Operations, DBA
Marostegui added a comment to T222050: db1107 (eventlogging db master) possibly memory issues.

@Cmjohnson as per the error @jcrespo pasted above is that enough to get Dell to send a new DIMM you think?

Mon, Jun 24, 8:41 AM · Analytics, Operations, ops-eqiad, Analytics-EventLogging, DBA
Marostegui triaged T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC as Normal priority.
Mon, Jun 24, 7:59 AM · Patch-For-Review, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, User-notice, Cognate, Language-Team, Growth-Team, Operations, DBA
Marostegui created T226358: Failover x1 master: db1069 to db1120 3rd July at 06:00 UTC.
Mon, Jun 24, 7:59 AM · Patch-For-Review, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks, Reading List Service, ContentTranslation, MediaWiki-extensions-BounceHandler, StructuredDiscussions, MediaWiki-extensions-UrlShortener, User-notice, Cognate, Language-Team, Growth-Team, Operations, DBA
Marostegui updated the task description for T222682: Productionize db11[26-38].
Mon, Jun 24, 6:37 AM · Patch-For-Review, Goal, DBA
Marostegui created P8644 (An Untitled Masterwork).
Mon, Jun 24, 6:02 AM
Marostegui updated the task description for T222682: Productionize db11[26-38].
Mon, Jun 24, 5:23 AM · Patch-For-Review, Goal, DBA
Marostegui updated the task description for T222682: Productionize db11[26-38].
Mon, Jun 24, 5:23 AM · Patch-For-Review, Goal, DBA
Marostegui added a comment to T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.

Same has been done on testwikidatawiki on s3:

root@db1123.eqiad.wmnet[testwikidatawiki]> rename table wikimedia_editor_tasks_entity_description_exists to T226326_wikimedia_editor_tasks_entity_description_exists;
Query OK, 0 rows affected (0.01 sec)
Mon, Jun 24, 4:59 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks
Marostegui claimed T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.

So for now I have renamed the table on db1092 and will leave it like that for a couple of days before dropping it for good, just to see if there are some unexpected issues:

root@db1092.eqiad.wmnet[wikidatawiki]> rename table wikimedia_editor_tasks_entity_description_exists to T226326_wikimedia_editor_tasks_entity_description_exists;
Query OK, 0 rows affected (0.01 sec)
Mon, Jun 24, 4:56 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks
Marostegui added a parent task for T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table: T54921: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking).
Mon, Jun 24, 4:53 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks
Marostegui added a subtask for T54921: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking): T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.
Mon, Jun 24, 4:53 AM · Epic, DBA, Tracking-Neverending
Marostegui removed a project from T225169: [4 hours] Investigate whether it's efficient to order by tag value (DBA input requested): DBA.
Mon, Jun 24, 4:49 AM · Community-Tech (Kanban), Spike, Growth-Team, PageCuration
Marostegui added a comment to T226337: (frwiki) User.php: CAS update failed on user_touched. The version of the user to be saved is older than the current version .

Some more details:

Mon, Jun 24, 4:49 AM · Availability, Wikimedia-production-error, MediaWiki-User-preferences

Sun, Jun 23

Marostegui added a comment to T226337: (frwiki) User.php: CAS update failed on user_touched. The version of the user to be saved is older than the current version .

Thanks for creating the task :)

Sun, Jun 23, 8:15 PM · Availability, Wikimedia-production-error, MediaWiki-User-preferences
Marostegui added a comment to T226297: ERROR 2013 (HY000): Lost connection to MySQL server during query on replicas.

Yeah, essentially we have 3 hosts. Usually only one of them is dedicated to the long queries (analytics) and 2 of the to the web service (fast queries), but due to the maintenance (T222978) we have now 1 host serving analytics which also serves a portion of web, and hence it is more loaded than normal.
This is the change: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/518029/

Sun, Jun 23, 8:39 AM · Data-Services
Marostegui updated the task description for T208323: Predictive failures on disk S.M.A.R.T. status.
Sun, Jun 23, 5:43 AM · Operations, DBA
Marostegui added a comment to T226326: Drop the `wikimedia_editor_tasks_entity_description_exists` table.

Can this go anytime then?

Sun, Jun 23, 5:23 AM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Schema-change, DBA, Beta-Cluster-Infrastructure, Reading-Infrastructure-Team-Backlog, WikimediaEditorTasks

Sat, Jun 22

Marostegui added a comment to T226297: ERROR 2013 (HY000): Lost connection to MySQL server during query on replicas.

It is because labsdb1010 is serving (temporarily while we do some maintenance on 1011) analytics but still has the query killer set to 300 seconds instead of 14400 (14400 is the one we use for the long query hosts). I have changed it and it is now set to 14400 so it should not be killing those small queries anymore.

Sat, Jun 22, 11:08 AM · Data-Services

Fri, Jun 21

Marostegui updated the task description for T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Fri, Jun 21, 9:30 AM · Patch-For-Review, DBA
Marostegui closed T225704: eqiad: rack/setup/install (4) dbproxy systems. as Resolved.

All hosts installed

Fri, Jun 21, 9:30 AM · Patch-For-Review, Operations, DBA
Marostegui closed T225704: eqiad: rack/setup/install (4) dbproxy systems., a subtask of T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4], as Resolved.
Fri, Jun 21, 9:30 AM · Patch-For-Review, DBA
Marostegui updated the task description for T225704: eqiad: rack/setup/install (4) dbproxy systems..
Fri, Jun 21, 9:30 AM · Patch-For-Review, Operations, DBA
Marostegui updated the task description for T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Fri, Jun 21, 9:14 AM · Patch-For-Review, DBA
Marostegui updated the task description for T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Fri, Jun 21, 9:14 AM · Patch-For-Review, DBA
Marostegui updated the task description for T225704: eqiad: rack/setup/install (4) dbproxy systems..
Fri, Jun 21, 9:13 AM · Patch-For-Review, Operations, DBA
Marostegui moved T225169: [4 hours] Investigate whether it's efficient to order by tag value (DBA input requested) from Triage to Done on the DBA board.

I have been checking this query on enwiki and it doesn't seem to be too bad:

root@db1089.eqiad.wmnet[enwiki]> FLUSH STATUS; pager cat > /dev/null; SELECT page_namespace, page_title, ptrpt_value FROM pagetriage_page_tags JOIN page ON ptrpt_page_id = page_id WHERE ptrpt_tag_id = 2 ORDER BY CAST(ptrpt_value AS SIGNED) DESC LIMIT 10; ; nopager; SHOW STATUS like 'Hand%';
Query OK, 0 rows affected (0.00 sec)
Fri, Jun 21, 8:29 AM · Community-Tech (Kanban), Spike, Growth-Team, PageCuration
Marostegui claimed T225704: eqiad: rack/setup/install (4) dbproxy systems..

While debugging we Arzhel we have noticed that the DNS entries for dbproxy1018 and dbproxy1019 didn't belong to the cloud network, I have changed them and I will to install again.

Fri, Jun 21, 8:08 AM · Patch-For-Review, Operations, DBA
Marostegui updated the task description for T202367: Productionize dbproxy101[2-7].eqiad.wmnet and dbproxy200[1-4].
Fri, Jun 21, 7:30 AM · Patch-For-Review, DBA
Marostegui reassigned T225704: eqiad: rack/setup/install (4) dbproxy systems. from Marostegui to Cmjohnson.

@Cmjohnson @ayounsi is there anything special with dbproxy1018 and dbproxy1019 VLAN's and PXE? None of the seems to be booting up from PXE, despite that the MACs I added on tftpboot are the same ones that the IDRAC show it is trying to boot up from:
dbproxy1018 4C:D9:8F:6C:A5:9E https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/518197/1/modules/install_server/files/dhcpd/linux-host-entries.ttyS1-115200
dbproxy1019 4C:D9:8F:6C:9F:2F https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/518203/2/modules/install_server/files/dhcpd/linux-host-entries.ttyS1-115200

Fri, Jun 21, 7:30 AM · Patch-For-Review, Operations, DBA
Marostegui updated the task description for T225704: eqiad: rack/setup/install (4) dbproxy systems..
Fri, Jun 21, 7:29 AM · Patch-For-Review, Operations, DBA
Marostegui updated the task description for T225704: eqiad: rack/setup/install (4) dbproxy systems..
Fri, Jun 21, 6:48 AM · Patch-For-Review, Operations, DBA
Marostegui closed T225884: db2084 temporary correctable hardware errors as Resolved.

And it finally cleared up

23:38:30 <+icinga-wm> RECOVERY - EDAC syslog messages on db2084 is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/host-overview?orgId=1&var-server=db2084&var-datasource=codfw+prometheus/ops
Fri, Jun 21, 4:57 AM · Operations, DBA

Thu, Jun 20

Marostegui added a comment to T225889: Degraded RAID on db2043.

And the disk failed again

Thu, Jun 20, 4:54 PM · DBA, Operations, ops-codfw
Marostegui closed T226194: Degraded RAID on db2043 as Declined.

Duplicate of T225889

Thu, Jun 20, 3:59 PM · Operations, ops-codfw
Marostegui added a comment to T225704: eqiad: rack/setup/install (4) dbproxy systems..

@RobH if you add the production DNS entries, I can take care of the installations myself

Thu, Jun 20, 3:27 PM · Patch-For-Review, Operations, DBA
Marostegui closed T225902: Degraded RAID on db2058 as Resolved.

The RAID is back to Optimal!

root@db2058:~# hpssacli controller all show config
Thu, Jun 20, 3:26 PM · DBA, Operations, ops-codfw
Marostegui added a comment to T225889: Degraded RAID on db2043.

@Papaul has removed and inserted back the disk and it is rebuilding again.
Let's see if it goes fine this time or we have to replace it completely

root@db2043:~#  hpssacli controller all show config
Thu, Jun 20, 2:19 PM · DBA, Operations, ops-codfw
Marostegui closed T226186: Degraded RAID on db2043 as Declined.

Duplicate of T225889

Thu, Jun 20, 2:09 PM · Operations, ops-codfw
Marostegui reassigned T225889: Degraded RAID on db2043 from Marostegui to Papaul.

The disk has failed - can we try a different one?

root@db2043:~#  hpssacli controller all show config
Thu, Jun 20, 2:07 PM · DBA, Operations, ops-codfw
Marostegui claimed T225902: Degraded RAID on db2058.
Thu, Jun 20, 2:07 PM · DBA, Operations, ops-codfw
Marostegui added a comment to T225902: Degraded RAID on db2058.

Sorry, this was for db2043

Thu, Jun 20, 2:06 PM · DBA, Operations, ops-codfw
Marostegui reassigned T225902: Degraded RAID on db2058 from Marostegui to Papaul.

The disk failed, can we try another one? Thanks!

physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 600 GB, Failed)
Thu, Jun 20, 2:05 PM · DBA, Operations, ops-codfw
Marostegui added a comment to T225902: Degraded RAID on db2058.

Thanks!
It is rebuilding

root@db2058:~# hpssacli controller all show config
Thu, Jun 20, 2:04 PM · DBA, Operations, ops-codfw
Marostegui closed T225643: Schema change to oathauth_users as Resolved.

All the fishbowl wikis are done:

for i in `cat s3_fishbowl  | awk -F "." '{print $1}'`; do echo $i; mysql.py -hdb1123 $i -e "show create table oathauth_users\G" | egrep "module|data";done
amwikimedia
  `module` varbinary(255) NOT NULL,
  `data` blob,
cnwikimedia
  `module` varbinary(255) NOT NULL,
  `data` blob,
donatewiki
  `module` varbinary(255) NOT NULL,
  `data` blob,
fixcopyrightwiki
  `module` varbinary(255) NOT NULL,
  `data` blob,
foundationwiki
  `module` varbinary(255) NOT NULL,
  `data` blob,
hiwikimedia
  `module` varbinary(255) NOT NULL,
  `data` blob,
idwikimedia
  `module` varbinary(255) NOT NULL,
  `data` blob,
maiwikimedia
  `module` varbinary(255) NOT NULL,
  `data` blob,
nostalgiawiki
  `module` varbinary(255) NOT NULL,
  `data` blob,
punjabiwikimedia
  `module` varbinary(255) NOT NULL,
  `data` blob,
romdwikimedia
  `module` varbinary(255) NOT NULL,
  `data` blob,
rswikimedia
  `module` varbinary(255) NOT NULL,
  `data` blob,
votewiki
  `module` varbinary(255) NOT NULL,
  `data` blob,
wbwikimedia
  `module` varbinary(255) NOT NULL,
  `data` blob,
Thu, Jun 20, 10:40 AM · MediaWiki-Database, DBA, MediaWiki-extensions-OATHAuth
Marostegui updated the task description for T225643: Schema change to oathauth_users.
Thu, Jun 20, 10:39 AM · MediaWiki-Database, DBA, MediaWiki-extensions-OATHAuth
Marostegui closed T225981: Replace db1077 with db1112 as Resolved.

db1077 is now replicating from db1111 in the test-s4 cluster.
The temporary data has been also removed from dbprov1001

Thu, Jun 20, 9:28 AM · DBA
Marostegui added a comment to T225988: decommission db2039.

Please mark disk #3 as broken so it doesn't get re-used T226155: Degraded RAID on db2039

Thu, Jun 20, 7:46 AM · DC-Ops, ops-codfw, Operations, decommission
Marostegui updated the task description for T225988: decommission db2039.
Thu, Jun 20, 7:46 AM · DC-Ops, ops-codfw, Operations, decommission