Page MenuHomePhabricator

Marostegui (Manuel Aróstegui)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Sep 1 2016, 6:48 AM (129 w, 1 d)
Availability
Available
IRC Nick
marostegui
LDAP User
Marostegui
MediaWiki User
MArostegui (WMF) [ Global Accounts ]

TZ: UTC +1/+2

Recent Activity

Today

Marostegui added a comment to T187295: Apply AbuseFilter patch-fix-index.

I have ran the ALTER on codfw DC for s5 section:

cebwiki
dewiki
enwikivoyage
mgwiktionary
shwiki
srwiki
Fri, Feb 22, 8:37 AM · AbuseFilter, DBA, Blocked-on-schema-change
Marostegui updated the task description for T187295: Apply AbuseFilter patch-fix-index.
Fri, Feb 22, 8:36 AM · AbuseFilter, DBA, Blocked-on-schema-change
Marostegui updated the task description for T86342: Dropping page.page_no_title_convert on wmf databases.
Fri, Feb 22, 7:01 AM · Blocked-on-schema-change, DBA, Schema-change
Marostegui updated the task description for T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5].
Fri, Feb 22, 6:56 AM · Patch-For-Review, User-Banyek, Analytics-Kanban, DBA, Analytics
Marostegui added a comment to T211668: mw1272 crashed: Bad page map in process hhvm.

This host crashed today again:

-------------------------------------------------------------------------------
Record:      40
Date/Time:   02/22/2019 06:10:16
Source:      system
Severity:    Ok
Description: A problem was detected related to the previous server boot.
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Record:      74
Date/Time:   02/22/2019 06:10:18
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
Record:      75
Date/Time:   02/22/2019 06:12:12
Source:      system
Severity:    Non-Critical
Description: Correctable memory error rate exceeded for DIMM_B1.
-------------------------------------------------------------------------------
Record:      76
Date/Time:   02/22/2019 06:14:41
Source:      system
Severity:    Critical
Description: Correctable memory error rate exceeded for DIMM_B1.
-------------------------------------------------------------------------------
Fri, Feb 22, 6:51 AM · serviceops, ops-eqiad, Operations, HHVM
Marostegui added a comment to T51199: Add index log_type_action.

No, those are all the logged queries involving the logging table that where logged on sys, that's why I said I didn't think it would be too useful :(

Fri, Feb 22, 6:34 AM · MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Blocked-on-schema-change, MediaWiki-Database, Schema-change
Marostegui added a comment to T216170: toolsdb - Per-user connection limits.

@Marostegui I see no subscribes or triggers on a quick pass in puppet, so if I'm not wrong I can change the config with puppet without auto-reloading or puppet restarting the server, right?

Fri, Feb 22, 6:07 AM · Patch-For-Review, cloud-services-team (Kanban), Toolforge, Data-Services
Marostegui closed T216670: Degraded RAID on db2050 as Resolved.

All good now, thank you!

logicaldrive 1 (3.3 TB, RAID 1+0, OK)
Fri, Feb 22, 6:03 AM · DBA, Operations, ops-codfw

Yesterday

Marostegui added a comment to T214720: db1114 crashed (HW memory issues).

@jcrespo maybe we can leave a mydumper running 24x7 on a loop for days on that host: dumping everything, deleting the backups file, dump everyting and so forth.

Thu, Feb 21, 8:20 PM · Patch-For-Review, DBA, ops-eqiad, Operations
Marostegui added a comment to T216656: API problem with usercontribs using `rev_user_text` rather than `rev_user`: Only use 'contributions' replica if querying by user ID.

Thanks for checking although now that I think about it, it is pretty much the same thing, it will timeout anyways (as we have seen) :-)

Thu, Feb 21, 4:26 PM · MW-1.33-notes (1.33.0-wmf.19; 2019-02-26), Patch-For-Review, DBA, MediaWiki-API
Marostegui added a comment to T216656: API problem with usercontribs using `rev_user_text` rather than `rev_user`: Only use 'contributions' replica if querying by user ID.

Change 491993 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] ApiQueryUserContribs: Only use 'contributions' replica if querying by user ID

https://gerrit.wikimedia.org/r/491993

Thu, Feb 21, 4:09 PM · MW-1.33-notes (1.33.0-wmf.19; 2019-02-26), Patch-For-Review, DBA, MediaWiki-API
Marostegui created P8116 (An Untitled Masterwork).
Thu, Feb 21, 3:39 PM
Marostegui added a comment to T216670: Degraded RAID on db2050.

Thanks!

logicaldrive 1 (3.3 TB, RAID 1+0, Recovering, 2% complete)
Thu, Feb 21, 3:28 PM · DBA, Operations, ops-codfw
Marostegui renamed T214720: db1114 crashed (HW memory issues) from db1114 crashed to db1114 crashed (HW memory issues).
Thu, Feb 21, 2:37 PM · Patch-For-Review, DBA, ops-eqiad, Operations
Marostegui moved T187295: Apply AbuseFilter patch-fix-index from Backlog to In progress on the Blocked-on-schema-change board.
Thu, Feb 21, 2:05 PM · AbuseFilter, DBA, Blocked-on-schema-change
Marostegui updated the task description for T187295: Apply AbuseFilter patch-fix-index.
Thu, Feb 21, 2:05 PM · AbuseFilter, DBA, Blocked-on-schema-change
Marostegui removed a project from T215616: Improve interlingual links across wikis through Wikidata IDs: DBA.

Going to remove the DBA tag from here as there are not really any actionables (yet) for the DBAs and we already provided some input here (T215616#4946564) and there is not much we can do about this at the moment.
I am leaving the MediaWiki-Database tag in case you want to discuss queries or even schema changes (then I would suggest you add Schema-change once you have some thoughts or proposals about it).
Lastly, I will remain subscribed to this task in case you need further help from us!

Thu, Feb 21, 1:06 PM · MediaWiki-Database, Wikidata, Analytics, Research
Marostegui added a comment to T187295: Apply AbuseFilter patch-fix-index.

Thank you!

Thu, Feb 21, 11:24 AM · AbuseFilter, DBA, Blocked-on-schema-change
Marostegui moved T187295: Apply AbuseFilter patch-fix-index from Next to In progress on the DBA board.

@Daimona I have done a quick grep on mediawiki-extensions-AbuseFilter and on mediawiki-core repo to make sure there are no FORCE INDEX on any of the following ones:

afl_filter
afl_user
afl_namespace
afl_ip
Thu, Feb 21, 11:19 AM · AbuseFilter, DBA, Blocked-on-schema-change
Marostegui claimed T187295: Apply AbuseFilter patch-fix-index.

As this drift has already created some issues I will try to work on this as a background task, trying to fix hosts slowly but steady.
Now that we can use ADD KEY IF NOT EXISTS and DROP KEY IF EXISTS it will be slightly easier, however from a first glance there are lots of drifts even between hosts on the same section.

Thu, Feb 21, 11:10 AM · AbuseFilter, DBA, Blocked-on-schema-change
Marostegui updated the task description for T214264: BBU issues on codfw.
Thu, Feb 21, 10:43 AM · Patch-For-Review, DBA
Marostegui updated the task description for T214264: BBU issues on codfw.
Thu, Feb 21, 10:39 AM · Patch-For-Review, DBA
Marostegui updated the task description for T210713: Drop change_tag.ct_tag column in production.
Thu, Feb 21, 10:24 AM · Patch-For-Review, Blocked-on-schema-change, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui added a comment to T210713: Drop change_tag.ct_tag column in production.

All the hosts are done except db1067 (s1 master T210713#4967984 ) which I will try a few more times before stalling this until we do a failover.

Thu, Feb 21, 10:07 AM · Patch-For-Review, Blocked-on-schema-change, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui updated the task description for T210713: Drop change_tag.ct_tag column in production.
Thu, Feb 21, 10:06 AM · Patch-For-Review, Blocked-on-schema-change, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui added a comment to T210992: Increase parsercache keys TTL from 22 days back to 30 days.

There is no significant increase that can be seen on the graphs, but also 2 days might be too low to notice something

Thu, Feb 21, 9:44 AM · Performance-Team (Radar), Patch-For-Review, Operations, DBA
Marostegui added a comment to T210992: Increase parsercache keys TTL from 22 days back to 30 days.

In a couple of days there it will be a month since I switched the TTL from 22 days to 24. There has not been any issues with this, so on Monday I think I will go from 24 to 30 as planned unless someone has any objection.
Thanks!

Thu, Feb 21, 9:37 AM · Performance-Team (Radar), Patch-For-Review, Operations, DBA
Marostegui updated the task description for T210713: Drop change_tag.ct_tag column in production.
Thu, Feb 21, 9:31 AM · Patch-For-Review, Blocked-on-schema-change, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui triaged T216656: API problem with usercontribs using `rev_user_text` rather than `rev_user`: Only use 'contributions' replica if querying by user ID as Normal priority.
Thu, Feb 21, 6:29 AM · MW-1.33-notes (1.33.0-wmf.19; 2019-02-26), Patch-For-Review, DBA, MediaWiki-API
Marostegui added a comment to T216656: API problem with usercontribs using `rev_user_text` rather than `rev_user`: Only use 'contributions' replica if querying by user ID.

For s2 we can probably decrease the main traffic weight for the rc replicas (db1103 and db1105) as the other hosts I think will have no problem to assume the traffic, but this is another case where "special" slaves are a snowflake and bite us :-(

Thu, Feb 21, 6:25 AM · MW-1.33-notes (1.33.0-wmf.19; 2019-02-26), Patch-For-Review, DBA, MediaWiki-API
Marostegui added a comment to T51199: Add index log_type_action.

I don't think this is too useful https://phabricator.wikimedia.org/P8114 :(

Thu, Feb 21, 6:15 AM · MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Blocked-on-schema-change, MediaWiki-Database, Schema-change
Marostegui moved T216670: Degraded RAID on db2050 from Triage to In progress on the DBA board.
Thu, Feb 21, 6:02 AM · DBA, Operations, ops-codfw
Marostegui assigned T216670: Degraded RAID on db2050 to Papaul.

Let's get the disk changed @Papaul - thanks!

Thu, Feb 21, 6:02 AM · DBA, Operations, ops-codfw

Wed, Feb 20

Marostegui added a comment to T216635: MySQL database on deployment-db03 does not start due to InnoDB issue.

Data looks very corrupted. At this point the best option is to rebuild that host from the slave.

Wed, Feb 20, 8:45 PM · Release-Engineering-Team, DBA, Beta-Cluster-Infrastructure
Marostegui added a comment to P8111 innodb-force-recovery=1 on deployment-db03 for T216635.

Data looks very corrupted. At this point the best option is to rebuild that host from the slave

Wed, Feb 20, 8:44 PM
Marostegui added a comment to T216635: MySQL database on deployment-db03 does not start due to InnoDB issue.

Anything on dmesg?
Can you do a touch /srv/test?

Wed, Feb 20, 5:36 PM · Release-Engineering-Team, DBA, Beta-Cluster-Infrastructure
Marostegui added a comment to T51199: Add index log_type_action.

It is certainly being used for some queries, I can see this counter increasing:

root@db1089.eqiad.wmnet[sys]> select rows_selected,select_latency from x$schema_index_statistics where table_name='logging' and index_name like 'log_title_%';
+---------------+----------------+
| rows_selected | select_latency |
+---------------+----------------+
|         14478 |  2739429620928 |
|           225 |    98277884848 |
+---------------+----------------+
2 rows in set (0.05 sec)
Wed, Feb 20, 5:24 PM · MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Blocked-on-schema-change, MediaWiki-Database, Schema-change
Marostegui added a comment to T216635: MySQL database on deployment-db03 does not start due to InnoDB issue.

Broken storage?:

Feb 18 13:24:54 mysqld[837]: InnoDB: Error number 5 means 'Input/output error'.
Wed, Feb 20, 5:24 PM · Release-Engineering-Team, DBA, Beta-Cluster-Infrastructure
Marostegui created P8108 (An Untitled Masterwork).
Wed, Feb 20, 2:38 PM
Marostegui added a comment to T201133: db1069 (x1 master) memory errors.

Just for the record

db1069
Wed, Feb 20, 10:57 AM · ops-eqiad, Operations, DBA
Marostegui updated the task description for T210713: Drop change_tag.ct_tag column in production.
Wed, Feb 20, 10:05 AM · Patch-For-Review, Blocked-on-schema-change, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui added a comment to T210713: Drop change_tag.ct_tag column in production.

s3 eqiad

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1004
  • dbstore1002
  • db1124
  • db1123
  • db1095
  • db1078
  • db1077
  • db1075
Wed, Feb 20, 10:04 AM · Patch-For-Review, Blocked-on-schema-change, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui added a comment to T210713: Drop change_tag.ct_tag column in production.

db1067 (s1 master) has too much concurrency to let the alter go thru, I will try a few more times before giving up on it and leaving it for when we either failover the master or the DC.

Wed, Feb 20, 10:02 AM · Patch-For-Review, Blocked-on-schema-change, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui updated the task description for T210713: Drop change_tag.ct_tag column in production.
Wed, Feb 20, 9:55 AM · Patch-For-Review, Blocked-on-schema-change, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui updated the task description for T86342: Dropping page.page_no_title_convert on wmf databases.
Wed, Feb 20, 8:52 AM · Blocked-on-schema-change, DBA, Schema-change
Marostegui added a comment to T51199: Add index log_type_action.

I have been taking a look at these indexes on enwiki, and we have two indexes in production that are not on tables.sql:

KEY `log_title_time` (`log_title`(16),`log_timestamp`),
KEY `log_title_type_time` (`log_title`(16),`log_type`,`log_timestamp`),
Wed, Feb 20, 8:33 AM · MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Blocked-on-schema-change, MediaWiki-Database, Schema-change
Marostegui moved T187295: Apply AbuseFilter patch-fix-index from Backlog to Next on the DBA board.
Wed, Feb 20, 7:18 AM · AbuseFilter, DBA, Blocked-on-schema-change

Tue, Feb 19

Marostegui added a project to T216526: Degraded RAID on cloudvirt1018: cloud-services-team.
Tue, Feb 19, 5:26 PM · cloud-services-team, ops-eqiad, Operations
Marostegui added a comment to T172410: Replace the current multisource analytics-store setup.

I just noticed that the tables related to the Echo extension are (surprisingly) not yet available in the enwiki shard (s1-analytics-replica.eqiad.wmnet), but are in analytics-store.eqiad.wmnet. Is there a page we can refer to to check on parity/status of data availability?

Tue, Feb 19, 4:29 PM · Product-Analytics, Analytics, WMDE-Analytics-Engineering, User-Addshore, User-Elukey, Research
Marostegui closed T216273: New cronspam from db clusters as Resolved.

Nothing has arrived since the restart without debug, so I think we are good

Tue, Feb 19, 2:09 PM · Operations
Marostegui closed T216273: New cronspam from db clusters, a subtask of T132324: Tracking and Reducing cron-spam to root@ , as Resolved.
Tue, Feb 19, 2:09 PM · Patch-For-Review, Operations
Marostegui added a comment to T149077: Certain ApiQueryRecentChanges::run api query is too slow, slowing down dewiki.

Could this be another case of MariaDB getting the optimizer fixed with a new version as it doesn't happen on 10.1.36 or 10.1.37 for the original query?

root@db1070.eqiad.wmnet[dewiki]> EXPLAIN SELECT /* ApiQueryRecentChanges::run */ rc_id, rc_timestamp, rc_namespace, rc_title, rc_cur_id, rc_type, rc_deleted, rc_this_oldid, rc_last_oldid FROM `recentchanges` WHERE (rc_timestamp>='20161024013525') AND rc_namespace IN ('0', '120') AND rc_type IN ('0', '1', '3', '6') ORDER BY rc_timestamp ASC, rc_id ASC LIMIT 101\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: recentchanges
         type: range
possible_keys: rc_timestamp,rc_ns_usertext,rc_name_type_patrolled_timestamp,rc_ns_actor,rc_namespace_title_timestamp
          key: rc_timestamp
      key_len: 16
          ref: NULL
         rows: 518658
        Extra: Using index condition; Using where
1 row in set (0.00 sec)
Tue, Feb 19, 1:47 PM · Core Platform Team Kanban (Waiting for Review), Core Platform Team Backlog (Watching / External), Wikimedia-production-error, Patch-For-Review, MediaWiki-API, DBA
Marostegui added a subtask for T216491: Decommission dbstore1002: T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5].
Tue, Feb 19, 10:44 AM · Patch-For-Review, ops-eqiad, Operations, Analytics
Marostegui added a parent task for T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5]: T216491: Decommission dbstore1002.
Tue, Feb 19, 10:44 AM · Patch-For-Review, User-Banyek, Analytics-Kanban, DBA, Analytics
Marostegui changed the status of T216491: Decommission dbstore1002 from Open to Stalled.
Tue, Feb 19, 10:44 AM · Patch-For-Review, ops-eqiad, Operations, Analytics
Marostegui created T216491: Decommission dbstore1002.
Tue, Feb 19, 10:44 AM · Patch-For-Review, ops-eqiad, Operations, Analytics
Marostegui lowered the priority of T213670: dbstore1002 Mysql errors from High to Low.

Reducing priority as the errors on dbstore1002 are not too important anymore as this host shouldn't be used anymore and everything using it should migrate to the new hosts T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5]

Tue, Feb 19, 10:39 AM · Patch-For-Review, Operations, Product-Analytics, Analytics-Kanban, Analytics
Marostegui added a comment to T213670: dbstore1002 Mysql errors.

For what is worth, dbstore1002 is now lagging behind on s8 (wikidatawiki) 7 days and it keeps lagging, I doubt it will ever catch up.
Yesterday the migration to dbstore1003-1005 of the staging database happened (T210478#4963411), so everyone should start using that one as soon as possible, specially after seeing so many crashes, lags that will never recover and corrupted data (due to the above crashes)

Tue, Feb 19, 10:37 AM · Patch-For-Review, Operations, Product-Analytics, Analytics-Kanban, Analytics
Marostegui added a comment to T215589: Migrate users to dbstore100[3-5].

For what is worth, dbstore1002 is now lagging behind on s8 (wikidatawiki) 7 days and it keeps lagging, I doubt it will ever catch up.

Tue, Feb 19, 10:36 AM · User-Marostegui, Analytics-Kanban, Analytics
Marostegui closed T174802: Archive and drop education program (ep_*) tables on all wikis as Resolved.

This is all done.
The only pending follow up is to remove the views which has its own task T216481: Remove views on ep_* tables on the wikireplicas hosts

Tue, Feb 19, 9:23 AM · Patch-For-Review, User-notice, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui closed T174802: Archive and drop education program (ep_*) tables on all wikis, a subtask of T54921: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking), as Resolved.
Tue, Feb 19, 9:23 AM · Epic, DBA, Tracking
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Tue, Feb 19, 9:22 AM · Patch-For-Review, User-notice, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Tue, Feb 19, 8:41 AM · Patch-For-Review, User-notice, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui triaged T216481: Remove views on ep_* tables on the wikireplicas hosts as Normal priority.
Tue, Feb 19, 8:36 AM · Patch-For-Review, cloud-services-team (Kanban), Data-Services
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Tue, Feb 19, 8:34 AM · Patch-For-Review, User-notice, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Tue, Feb 19, 8:24 AM · Patch-For-Review, User-notice, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Tue, Feb 19, 8:06 AM · Patch-For-Review, User-notice, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui added a comment to T216240: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092.

db1106 has been rebooted (and kernel was upgraded)

Tue, Feb 19, 7:56 AM · ops-codfw, Operations, DBA
Marostegui updated the task description for T216240: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092.
Tue, Feb 19, 7:56 AM · ops-codfw, Operations, DBA
Marostegui added a comment to T216273: New cronspam from db clusters.

I have rebooted db1106, I will give it sometime to confirm the spam is gone before closing this task.

Tue, Feb 19, 7:56 AM · Operations
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Tue, Feb 19, 7:49 AM · Patch-For-Review, User-notice, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui changed the status of T216444: Global rename of Дагиров Умар → Takhirgeran Umar: supervision needed from Open to Stalled.
Tue, Feb 19, 7:15 AM · DBA, Wikimedia-Site-requests
Marostegui changed the status of T216444: Global rename of Дагиров Умар → Takhirgeran Umar: supervision needed, a subtask of T169440: Pending global renames in need of sysadmin supervision (tracking), from Open to Stalled.
Tue, Feb 19, 7:14 AM · GlobalRename, MediaWiki-extensions-CentralAuth, Tracking
Marostegui updated the task description for T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5].
Tue, Feb 19, 6:10 AM · Patch-For-Review, User-Banyek, Analytics-Kanban, DBA, Analytics
Marostegui added a comment to T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5].

The migration finished. These are the times in UTC from 18th Feb 2019:

Tue, Feb 19, 6:10 AM · Patch-For-Review, User-Banyek, Analytics-Kanban, DBA, Analytics

Mon, Feb 18

Marostegui added a comment to T216444: Global rename of Дагиров Умар → Takhirgeran Umar: supervision needed.

This should wait until T215107 is unblocked and resolved T215107#4962933

Mon, Feb 18, 8:07 PM · DBA, Wikimedia-Site-requests
Marostegui added a comment to T216441: Evaluate transferring the non-replicated tables to the new toolsdb server.

Of course :-). Just mentioning this as an idea to Cloud Team

Mon, Feb 18, 8:05 PM · Data-Services, cloud-services-team (Kanban)
Marostegui added a comment to T216441: Evaluate transferring the non-replicated tables to the new toolsdb server.

Just saying: we have a testing host where we could try to import those databases from labsdb1005 and see if they fail or what fails during the import process.
Let me know if I can help with this.

Mon, Feb 18, 7:55 PM · Data-Services, cloud-services-team (Kanban)
Marostegui added a comment to T215107: Global rename of The_Photographer → Wilfredor: supervision needed.

There have been no retries from what I can see on: https://logstash.wikimedia.org/goto/65afdb88fef30982130c53e40a644b06

Mon, Feb 18, 5:44 PM · Patch-For-Review, User-MarcoAurelio, DBA, Wikimedia-Site-requests
Marostegui added a comment to T215107: Global rename of The_Photographer → Wilfredor: supervision needed.

It timed out on Commonswiki:
https://logstash.wikimedia.org/goto/34de73560ce6692f0012e846f7a4de0c
Maybe @Legoktm can help to unblock it?

Mon, Feb 18, 2:56 PM · Patch-For-Review, User-MarcoAurelio, DBA, Wikimedia-Site-requests
Marostegui added a comment to T215107: Global rename of The_Photographer → Wilfredor: supervision needed.

Go for it!

Mon, Feb 18, 2:41 PM · Patch-For-Review, User-MarcoAurelio, DBA, Wikimedia-Site-requests
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Mon, Feb 18, 2:22 PM · Patch-For-Review, User-notice, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui added a comment to T193264: Replace labsdb100[4567] with instances on cloudvirt1019 and cloudvirt1020.

I have been talking to @aborrero about the new instance on clouddb1001 - and I have been taking a general look.
While comparing the grants, I have realised that clouddb1001 is missing a grant for the following user: s52716 (that grant exists on labsdb1005) it could be a new user. I can easily copy that grant over to clouddb1001, but I want the green light from @Bstorm just in case this has something to do with maintain-dbusers or something :-)

Mon, Feb 18, 2:21 PM · cloud-services-team (Kanban), Patch-For-Review, Epic, Cloud-VPS
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Mon, Feb 18, 10:50 AM · Patch-For-Review, User-notice, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui claimed T216273: New cronspam from db clusters.

I will take care of db1106 as I need to depool it anyways today or tomorrow.

Mon, Feb 18, 10:08 AM · Operations
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Mon, Feb 18, 9:32 AM · Patch-For-Review, User-notice, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui updated the task description for T210713: Drop change_tag.ct_tag column in production.
Mon, Feb 18, 9:05 AM · Patch-For-Review, Blocked-on-schema-change, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui added a comment to T210713: Drop change_tag.ct_tag column in production.

s1 eqiad progress

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1003
  • dbstore1002
  • dbstore1001
  • db1124
  • db1119
  • db1118
  • db1106
  • db1105
  • db1099
  • db1089
  • db1083
  • db1080
  • db1067 T210713#4967984
Mon, Feb 18, 9:04 AM · Patch-For-Review, Blocked-on-schema-change, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui updated the task description for T210713: Drop change_tag.ct_tag column in production.
Mon, Feb 18, 9:02 AM · Patch-For-Review, Blocked-on-schema-change, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui added a comment to T213406: Purchase and setup remaining hosts for database backups.

I think we can close this, we already have the tasks in place:
T216142
T216138
T216137
T214069
T214066

Mon, Feb 18, 8:37 AM · DBA
Marostegui updated the task description for T174802: Archive and drop education program (ep_*) tables on all wikis.
Mon, Feb 18, 8:32 AM · Patch-For-Review, User-notice, Datasets-General-or-Unknown, Data-Services, DBA
Marostegui added a comment to T216273: New cronspam from db clusters.

db2085 has been rebooted - let's see if that stops the amount of emails.

Mon, Feb 18, 7:01 AM · Operations
Marostegui added a comment to T216240: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092.

I have rebooted db2085 without debug option on kernel as part of (T216273) and I have taken the opportunity to upgrade its kernel too.

Mon, Feb 18, 6:59 AM · ops-codfw, Operations, DBA
Marostegui updated the task description for T216240: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092.
Mon, Feb 18, 6:58 AM · ops-codfw, Operations, DBA
Marostegui removed a project from T216213: s52481__stats_global running CREATE DATABASE IF NOT EXISTS on too many queries causing locking issues: DBA.
Mon, Feb 18, 6:18 AM · Data-Services, Tracking, Toolforge

Sun, Feb 17

Marostegui edited projects for T216353: toolsdb: firewalling changes for new setup (temporal mysql replication), added: User-Marostegui; removed DBA.
Sun, Feb 17, 7:26 PM · User-Marostegui, netops, Operations, cloud-services-team (Kanban), Cloud-VPS

Sat, Feb 16

Marostegui added a comment to T216273: New cronspam from db clusters.

We probably just need to reboot them without the kernel running debug mode as spoken on Friday

Sat, Feb 16, 9:03 PM · Operations

Fri, Feb 15

Marostegui added a comment to T215589: Migrate users to dbstore100[3-5].

Ah, nevermind my comment, you decided to completely move away from dbstore1002 :-)
Thanks!

Fri, Feb 15, 9:23 PM · User-Marostegui, Analytics-Kanban, Analytics
Marostegui added a comment to T215589: Migrate users to dbstore100[3-5].

Another staging database where? Just to clarify: dbstore1002 will be full read only after the migration (MySQL doesn't allow to set read only on a database level, it is a global flag).

Fri, Feb 15, 9:21 PM · User-Marostegui, Analytics-Kanban, Analytics
Marostegui added a comment to T216133: Increase visibility of auto-generated tasks for RAID errors.

For what is worth, I do have a Herald rule that automatically subscribes me to any degraded RAID ticket for the databases and that proved to be a good way to get my attention, as otherwise monitoring the Operations queue is hard and it is easy to miss things.

Fri, Feb 15, 5:41 PM · DC-Ops, Operations, Wikimedia-Incident, cloud-services-team (Kanban)