Page MenuHomePhabricator
Feed Advanced Search

Feb 27 2019

Marostegui closed T215107: Global rename of The_Photographer → Wilfredor: supervision needed as Resolved.
Feb 27 2019, 6:01 AM · MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), Patch-For-Review, User-MarcoAurelio, DBA, Wikimedia-Site-requests
Marostegui reassigned T215107: Global rename of The_Photographer → Wilfredor: supervision needed from MarcoAurelio to Tgr.

Oh, right, this is T188882: Attachment method should be preserved through global rename, let's follow up there. The accounts should be usable, this is more of a display bug.

Feb 27 2019, 6:01 AM · MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), Patch-For-Review, User-MarcoAurelio, DBA, Wikimedia-Site-requests
Marostegui closed T215107: Global rename of The_Photographer → Wilfredor: supervision needed, a subtask of T169440: Pending global renames in need of sysadmin supervision (tracking), as Resolved.
Feb 27 2019, 6:01 AM · GlobalRename, MediaWiki-extensions-CentralAuth, Tracking-Neverending
Marostegui changed the status of T216444: Global rename of Дагиров Умар → Takhirgeran Umar: supervision needed from Stalled to Open.

When do you want to do this?

Feb 27 2019, 5:59 AM · DBA, Wikimedia-Site-requests
Marostegui changed the status of T216444: Global rename of Дагиров Умар → Takhirgeran Umar: supervision needed, a subtask of T169440: Pending global renames in need of sysadmin supervision (tracking), from Stalled to Open.
Feb 27 2019, 5:59 AM · GlobalRename, MediaWiki-extensions-CentralAuth, Tracking-Neverending

Feb 26 2019

Marostegui created P8130 (An Untitled Masterwork).
Feb 26 2019, 10:43 AM
Marostegui added a comment to T86342: Dropping page.page_no_title_convert on wmf databases.

s2 eqiad progress

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1004
  • dbstore1002
  • db1125
  • db1122
  • db1105
  • db1103
  • db1095
  • db1090
  • db1076
  • db1074
  • db1066
Feb 26 2019, 10:11 AM · Schema-change-in-production, DBA, Schema-change
Marostegui added a comment to T187295: Apply AbuseFilter patch-fix-index.

@Marostegui Thanks, that's nice to hear! Given that queries like that one are pretty common, this is surely a huge performance boost.

Feb 26 2019, 9:41 AM · AbuseFilter, DBA, Schema-change-in-production
jcrespo awarded T187295: Apply AbuseFilter patch-fix-index a Love token.
Feb 26 2019, 8:54 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui updated the task description for T187295: Apply AbuseFilter patch-fix-index.
Feb 26 2019, 8:07 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui changed the status of T210713: Drop change_tag.ct_tag column in production from Open to Stalled.

Stalling this until we have failed over s1 master, as it is impossible to alter that host whilst it is active.

Feb 26 2019, 8:02 AM · Schema-change-in-production, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui changed the status of T210713: Drop change_tag.ct_tag column in production, a subtask of T194163: Drop change_tag.ct_tag column, from Open to Stalled.
Feb 26 2019, 8:02 AM · Wikidata, MW-1.33-notes (1.33.0-wmf.8; 2018-12-11), User-Ladsgroup, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞ (On Hold)), MediaWiki-libs-Rdbms, MediaWiki-Change-tagging
Marostegui moved T86342: Dropping page.page_no_title_convert on wmf databases from Backlog to In progress on the Schema-change-in-production board.
Feb 26 2019, 7:32 AM · Schema-change-in-production, DBA, Schema-change
Marostegui moved T215107: Global rename of The_Photographer → Wilfredor: supervision needed from Blocked external/Not db team to Done on the DBA board.
Feb 26 2019, 7:32 AM · MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), Patch-For-Review, User-MarcoAurelio, DBA, Wikimedia-Site-requests
Marostegui updated the task description for T86342: Dropping page.page_no_title_convert on wmf databases.
Feb 26 2019, 7:23 AM · Schema-change-in-production, DBA, Schema-change
Marostegui updated the task description for T86342: Dropping page.page_no_title_convert on wmf databases.
Feb 26 2019, 7:01 AM · Schema-change-in-production, DBA, Schema-change
Marostegui added a comment to T215107: Global rename of The_Photographer → Wilfredor: supervision needed.

Thanks @Tgr!
@Wilfredor can you try to log-in now?

Feb 26 2019, 6:51 AM · MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), Patch-For-Review, User-MarcoAurelio, DBA, Wikimedia-Site-requests
Marostegui updated the task description for T187295: Apply AbuseFilter patch-fix-index.
Feb 26 2019, 6:26 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui added a comment to T187295: Apply AbuseFilter patch-fix-index.

s1 eqiad progress

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1003
  • dbstore1002
  • dbstore1001
  • db1124
  • db1119
  • db1118
  • db1106
  • db1105
  • db1099
  • db1089
  • db1083
  • db1080
  • db1067
Feb 26 2019, 6:26 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui added a comment to T187295: Apply AbuseFilter patch-fix-index.

I used the following query on db1083 to measure the impact of the index change (I executed the query twice to make sure it was "warm"):

SELECT /* IndexPager::buildQueryInfo (AbuseLogPager) xx */ * FROM `abuse_filter_log` LEFT JOIN `abuse_filter` ON ((af_id=afl_filter)) WHERE afl_filter = '423' AND ((afl_deleted = '0') OR (afl_deleted IS NULL)) ORDER BY afl_timestamp DESC LIMIT 51;
Feb 26 2019, 6:25 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui moved T217073: Clean up orphaned echo_event rows again from Triage to Blocked external/Not db team on the DBA board.

Thanks for the heads up, that works for me!

Feb 26 2019, 6:03 AM · Growth-Team (Sprint 0 (Growth Team)), DBA, Essential-Work, Notifications
Marostegui added a comment to T215107: Global rename of The_Photographer → Wilfredor: supervision needed.

@MarcoAurelio green light from use to get the job re-scheduled

Feb 26 2019, 5:59 AM · MW-1.33-notes (1.33.0-wmf.18; 2019-02-19), Patch-For-Review, User-MarcoAurelio, DBA, Wikimedia-Site-requests

Feb 25 2019

Marostegui added a comment to T214720: db1114 crashed (HW memory issues).

Excellent! Thank you Chris!

Feb 25 2019, 4:32 PM · Patch-For-Review, DBA, SRE, ops-eqiad
Marostegui added a comment to T187295: Apply AbuseFilter patch-fix-index.

Yeah, I will ask for that once we have more replicas altered, otherwise you might sometimes reach one that is altered and another time one that is not :)
Thank you!

Feb 25 2019, 4:07 PM · AbuseFilter, DBA, Schema-change-in-production
Marostegui added a comment to T187295: Apply AbuseFilter patch-fix-index.

Nothing weird seems to have happened during the last 6 hours, and this is good. However, I guess that any possible trouble won't be visible before reaching enwiki...

Feb 25 2019, 4:03 PM · AbuseFilter, DBA, Schema-change-in-production
Marostegui moved T72005: Apply enum changes to (img|oi|fa)_major_mime on production from Backlog to Pending comment on the DBA board.
Feb 25 2019, 2:23 PM · DBA, Schema-change
Marostegui moved T71127: Discrepancies with logging table on different wikis from Backlog to Pending comment on the DBA board.
Feb 25 2019, 2:23 PM · Data-Services, DBA
Marostegui moved T205626: Document clearly the mariadb backup and recovery setup from Backlog to Pending comment on the DBA board.
Feb 25 2019, 2:23 PM · DBA
Marostegui moved T86342: Dropping page.page_no_title_convert on wmf databases from Pending comment to In progress on the DBA board.
Feb 25 2019, 2:01 PM · Schema-change-in-production, DBA, Schema-change
Marostegui added a comment to T187295: Apply AbuseFilter patch-fix-index.

@Daimona the following wikis are now fully altered, I am going to give it some hours before continuing and will monitor tendril:
s2:
bgwiki
bgwiktionary
cswiki
enwikiquote
enwiktionary
eowiki
fiwiki
idwiki
itwiki
nlwiki
nowiki
plwiki
ptwiki
svwiki
thwiki
trwiki
zhwiki

Feb 25 2019, 10:14 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui updated the task description for T187295: Apply AbuseFilter patch-fix-index.
Feb 25 2019, 10:12 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui added a comment to T187295: Apply AbuseFilter patch-fix-index.

s2 eqiad progress

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1004
  • dbstore1002
  • db1125
  • db1122
  • db1105
  • db1103
  • db1095
  • db1090
  • db1076
  • db1074
  • db1066
Feb 25 2019, 10:12 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui added a comment to T187295: Apply AbuseFilter patch-fix-index.

Thanks @Daimona!
Is this an example of a slow query?

SELECT /* IndexPager::buildQueryInfo (AbuseLogPager) xxx */ * FROM `abuse_filter_log` LEFT JOIN `abuse_filter` ON ((af_id=afl_filter)) WHERE afl_filter = '550' AND ((afl_deleted = '0') OR (afl_deleted IS NULL)) ORDER BY afl_timestamp LIMIT 51 /*

Uhm, has it been reported as slow? I couldn't spot it, if so. Yes, this query being slow could be a side-effect of changing indexes.

Feb 25 2019, 9:17 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui added a comment to T210992: Increase parsercache keys TTL from 22 days back to 30 days.

I have increased the TTL back to 30 days.
Going to monitor the graphs for a few days before closing this.

Feb 25 2019, 9:15 AM · Performance-Team (Radar), SRE, DBA
Marostegui updated the task description for T187295: Apply AbuseFilter patch-fix-index.
Feb 25 2019, 8:57 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui updated the task description for T187295: Apply AbuseFilter patch-fix-index.
Feb 25 2019, 8:48 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui added a comment to T187295: Apply AbuseFilter patch-fix-index.

s6 eqiad progress

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1005
  • dbstore1002
  • dbstore1001
  • db1125
  • db1113
  • db1098
  • db1096
  • db1093
  • db1088
  • db1085
  • db1061
Feb 25 2019, 8:47 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui updated the task description for T187295: Apply AbuseFilter patch-fix-index.
Feb 25 2019, 8:09 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui added a comment to T86342: Dropping page.page_no_title_convert on wmf databases.

s6 eqiad progress

Feb 25 2019, 8:06 AM · Schema-change-in-production, DBA, Schema-change
Marostegui updated the task description for T187295: Apply AbuseFilter patch-fix-index.
Feb 25 2019, 8:05 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui added a comment to T187295: Apply AbuseFilter patch-fix-index.

Thanks @Daimona!
Is this an example of a slow query?

SELECT /* IndexPager::buildQueryInfo (AbuseLogPager) xxx */ * FROM `abuse_filter_log` LEFT JOIN `abuse_filter` ON ((af_id=afl_filter)) WHERE afl_filter = '550' AND ((afl_deleted = '0') OR (afl_deleted IS NULL)) ORDER BY afl_timestamp LIMIT 51 /*
Feb 25 2019, 8:00 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui updated the task description for T86342: Dropping page.page_no_title_convert on wmf databases.
Feb 25 2019, 7:35 AM · Schema-change-in-production, DBA, Schema-change
Marostegui closed T197486: prop=revisions API timing out for a specific user and pages they edited as Resolved.

All the core replicas that receive this query are now running > 10.1.36 which doesn't have this optimizer "bug".
The masters aren't running those version, but they are not receiving (or shouldn't be) this queries so this is pretty much solved.

Feb 25 2019, 6:31 AM · DBA, MediaWiki-libs-Rdbms, MediaWiki-Action-API
Marostegui added a comment to T187295: Apply AbuseFilter patch-fix-index.

s5 eqiad progress

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1003
  • dbstore1002
  • db1124
  • db1113
  • db1110
  • db1102
  • db1100
  • db1097
  • db1096
  • db1082
  • db1070
Feb 25 2019, 6:25 AM · AbuseFilter, DBA, Schema-change-in-production

Feb 24 2019

Marostegui added a comment to T213670: dbstore1002 Mysql errors.

mysql crashed last night:

Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48000
mysys/stacktrace.c:247(my_print_stacktrace)[0xbdd6ee]
sql/signal_handler.cc:153(handle_fatal_signal)[0x73dc40]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f77a198a330]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f77a079ec37]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f77a07a2028]
srv/srv0srv.cc:2200(srv_error_monitor_thread)[0x9870aa]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7f77a1982184]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f77a086603d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
190224 03:20:23 mysqld_safe Number of processes running now: 0
190224 03:20:23 mysqld_safe mysqld restarted
Feb 24 2019, 7:09 AM · Patch-For-Review, SRE, Product-Analytics, Analytics-Kanban, Analytics
Marostegui added a comment to T214720: db1114 crashed (HW memory issues).

@Cmjohnson db1114 crashed again with the same memory errors on the same slots, so it looks like the mainboard memory slots aren't healthy?

Record:      1
Date/Time:   02/21/2019 19:30:12
Source:      system
Severity:    Ok
Description: Log cleared.
-------------------------------------------------------------------------------
Record:      2
Date/Time:   02/23/2019 21:25:36
Source:      system
Severity:    Non-Critical
Description: Correctable memory error rate exceeded for DIMM_B7.
-------------------------------------------------------------------------------
Record:      3
Date/Time:   02/23/2019 21:25:37
Source:      system
Severity:    Non-Critical
Description: Correctable memory error rate exceeded for DIMM_B3.
-------------------------------------------------------------------------------
Record:      4
Date/Time:   02/23/2019 21:25:58
Source:      system
Severity:    Critical
Description: Correctable memory error rate exceeded for DIMM_B7.
-------------------------------------------------------------------------------
Feb 24 2019, 6:49 AM · Patch-For-Review, DBA, SRE, ops-eqiad
Marostegui moved T214720: db1114 crashed (HW memory issues) from Blocked external/Not db team to In progress on the DBA board.
Feb 24 2019, 6:48 AM · Patch-For-Review, DBA, SRE, ops-eqiad

Feb 22 2019

Marostegui removed a subtask for T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5]: Unknown Object (Task).
Feb 22 2019, 11:55 AM · Analytics-Radar, Patch-For-Review, User-Banyek, Analytics-Kanban, DBA
Marostegui added a subtask for T172410: Replace the current multisource analytics-store setup: Unknown Object (Task).
Feb 22 2019, 11:54 AM · Analytics-Radar, Product-Analytics, WMDE-Analytics-Engineering, User-Addshore, User-Elukey, Research
Marostegui updated the task description for T215589: Migrate users to dbstore100[3-5].
Feb 22 2019, 11:08 AM · User-Marostegui, Analytics-Kanban, Analytics
Marostegui updated the task description for T215589: Migrate users to dbstore100[3-5].
Feb 22 2019, 11:08 AM · User-Marostegui, Analytics-Kanban, Analytics
Marostegui closed T213670: dbstore1002 Mysql errors as Resolved.

MySQL will be stopped the 4th of March as a final part of the deprecation of this host.
It has been on read only since 18th Feb anyways. The data should not be trusted anymore as it is very corrupted as a result of all the crashes it has had.
I am closing this as this host will no longer have support, if mysql crashes it will get restarted and replication started automatically with idempotent mode (T213670#4934489)

Feb 22 2019, 11:02 AM · Patch-For-Review, SRE, Product-Analytics, Analytics-Kanban, Analytics
Marostegui closed T213670: dbstore1002 Mysql errors, a subtask of T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5], as Resolved.
Feb 22 2019, 11:02 AM · Analytics-Radar, Patch-For-Review, User-Banyek, Analytics-Kanban, DBA
Marostegui added a comment to T187295: Apply AbuseFilter patch-fix-index.

I have ran the ALTER on codfw DC for s5 section:

cebwiki
dewiki
enwikivoyage
mgwiktionary
shwiki
srwiki
Feb 22 2019, 8:37 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui updated the task description for T187295: Apply AbuseFilter patch-fix-index.
Feb 22 2019, 8:36 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui updated the task description for T86342: Dropping page.page_no_title_convert on wmf databases.
Feb 22 2019, 7:01 AM · Schema-change-in-production, DBA, Schema-change
Marostegui updated the task description for T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5].
Feb 22 2019, 6:56 AM · Analytics-Radar, Patch-For-Review, User-Banyek, Analytics-Kanban, DBA
Marostegui added a comment to T211668: mw1272 crashed: Bad page map in process hhvm.

This host crashed today again:

-------------------------------------------------------------------------------
Record:      40
Date/Time:   02/22/2019 06:10:16
Source:      system
Severity:    Ok
Description: A problem was detected related to the previous server boot.
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Record:      74
Date/Time:   02/22/2019 06:10:18
Source:      system
Severity:    Ok
Description: An OEM diagnostic event occurred.
-------------------------------------------------------------------------------
Record:      75
Date/Time:   02/22/2019 06:12:12
Source:      system
Severity:    Non-Critical
Description: Correctable memory error rate exceeded for DIMM_B1.
-------------------------------------------------------------------------------
Record:      76
Date/Time:   02/22/2019 06:14:41
Source:      system
Severity:    Critical
Description: Correctable memory error rate exceeded for DIMM_B1.
-------------------------------------------------------------------------------
Feb 22 2019, 6:51 AM · serviceops, ops-eqiad, SRE, HHVM
Marostegui added a comment to T51199: Add index log_type_action.

No, those are all the logged queries involving the logging table that where logged on sys, that's why I said I didn't think it would be too useful :(

Feb 22 2019, 6:34 AM · MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Schema-change-in-production, MediaWiki-libs-Rdbms, Schema-change
Marostegui added a comment to T216170: toolsdb - Per-user connection limits.

@Marostegui I see no subscribes or triggers on a quick pass in puppet, so if I'm not wrong I can change the config with puppet without auto-reloading or puppet restarting the server, right?

Feb 22 2019, 6:07 AM · cloud-services-team (Kanban), Toolforge, Data-Services
Marostegui closed T216670: Degraded RAID on db2050 as Resolved.

All good now, thank you!

logicaldrive 1 (3.3 TB, RAID 1+0, OK)
Feb 22 2019, 6:03 AM · DBA, SRE, ops-codfw

Feb 21 2019

Marostegui added a comment to T214720: db1114 crashed (HW memory issues).

@jcrespo maybe we can leave a mydumper running 24x7 on a loop for days on that host: dumping everything, deleting the backups file, dump everyting and so forth.

Feb 21 2019, 8:20 PM · Patch-For-Review, DBA, SRE, ops-eqiad
Marostegui added a comment to T216656: API problem with usercontribs using `rev_user_text` rather than `rev_user`: Only use 'contributions' replica if querying by user ID.

Thanks for checking although now that I think about it, it is pretty much the same thing, it will timeout anyways (as we have seen) :-)

Feb 21 2019, 4:26 PM · Platform Team Workboards (Done with CPT), MW-1.33-notes (1.33.0-wmf.19; 2019-02-26), DBA, MediaWiki-Action-API
Marostegui added a comment to T216656: API problem with usercontribs using `rev_user_text` rather than `rev_user`: Only use 'contributions' replica if querying by user ID.

Change 491993 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] ApiQueryUserContribs: Only use 'contributions' replica if querying by user ID

https://gerrit.wikimedia.org/r/491993

Feb 21 2019, 4:09 PM · Platform Team Workboards (Done with CPT), MW-1.33-notes (1.33.0-wmf.19; 2019-02-26), DBA, MediaWiki-Action-API
Marostegui created P8116 (An Untitled Masterwork).
Feb 21 2019, 3:39 PM
Marostegui added a comment to T216670: Degraded RAID on db2050.

Thanks!

logicaldrive 1 (3.3 TB, RAID 1+0, Recovering, 2% complete)
Feb 21 2019, 3:28 PM · DBA, SRE, ops-codfw
Marostegui renamed T214720: db1114 crashed (HW memory issues) from db1114 crashed to db1114 crashed (HW memory issues).
Feb 21 2019, 2:37 PM · Patch-For-Review, DBA, SRE, ops-eqiad
Marostegui moved T187295: Apply AbuseFilter patch-fix-index from Backlog to In progress on the Schema-change-in-production board.
Feb 21 2019, 2:05 PM · AbuseFilter, DBA, Schema-change-in-production
Marostegui updated the task description for T187295: Apply AbuseFilter patch-fix-index.
Feb 21 2019, 2:05 PM · AbuseFilter, DBA, Schema-change-in-production
Marostegui removed a project from T215616: Improve interlingual links across wikis through Wikidata IDs: DBA.

Going to remove the DBA tag from here as there are not really any actionables (yet) for the DBAs and we already provided some input here (T215616#4946564) and there is not much we can do about this at the moment.
I am leaving the MediaWiki-libs-Rdbms tag in case you want to discuss queries or even schema changes (then I would suggest you add Schema-change once you have some thoughts or proposals about it).
Lastly, I will remain subscribed to this task in case you need further help from us!

Feb 21 2019, 1:06 PM · Data-Engineering-Icebox, Analytics-Radar, Research-Freezer, MediaWiki-General, Wikidata
Marostegui added a comment to T187295: Apply AbuseFilter patch-fix-index.

Thank you!

Feb 21 2019, 11:24 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui moved T187295: Apply AbuseFilter patch-fix-index from Pending comment to In progress on the DBA board.

@Daimona I have done a quick grep on mediawiki-extensions-AbuseFilter and on mediawiki-core repo to make sure there are no FORCE INDEX on any of the following ones:

afl_filter
afl_user
afl_namespace
afl_ip
Feb 21 2019, 11:19 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui claimed T187295: Apply AbuseFilter patch-fix-index.

As this drift has already created some issues I will try to work on this as a background task, trying to fix hosts slowly but steady.
Now that we can use ADD KEY IF NOT EXISTS and DROP KEY IF EXISTS it will be slightly easier, however from a first glance there are lots of drifts even between hosts on the same section.

Feb 21 2019, 11:10 AM · AbuseFilter, DBA, Schema-change-in-production
Marostegui updated the task description for T214264: BBU issues on codfw.
Feb 21 2019, 10:43 AM · DBA
Marostegui updated the task description for T214264: BBU issues on codfw.
Feb 21 2019, 10:39 AM · DBA
Marostegui updated the task description for T210713: Drop change_tag.ct_tag column in production.
Feb 21 2019, 10:24 AM · Schema-change-in-production, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui added a comment to T210713: Drop change_tag.ct_tag column in production.

All the hosts are done except db1067 (s1 master T210713#4967984 ) which I will try a few more times before stalling this until we do a failover.

Feb 21 2019, 10:07 AM · Schema-change-in-production, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui updated the task description for T210713: Drop change_tag.ct_tag column in production.
Feb 21 2019, 10:06 AM · Schema-change-in-production, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui added a comment to T210992: Increase parsercache keys TTL from 22 days back to 30 days.

There is no significant increase that can be seen on the graphs, but also 2 days might be too low to notice something

Feb 21 2019, 9:44 AM · Performance-Team (Radar), SRE, DBA
Marostegui added a comment to T210992: Increase parsercache keys TTL from 22 days back to 30 days.

In a couple of days there it will be a month since I switched the TTL from 22 days to 24. There has not been any issues with this, so on Monday I think I will go from 24 to 30 as planned unless someone has any objection.
Thanks!

Feb 21 2019, 9:37 AM · Performance-Team (Radar), SRE, DBA
Marostegui updated the task description for T210713: Drop change_tag.ct_tag column in production.
Feb 21 2019, 9:31 AM · Schema-change-in-production, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui triaged T216656: API problem with usercontribs using `rev_user_text` rather than `rev_user`: Only use 'contributions' replica if querying by user ID as Medium priority.
Feb 21 2019, 6:29 AM · Platform Team Workboards (Done with CPT), MW-1.33-notes (1.33.0-wmf.19; 2019-02-26), DBA, MediaWiki-Action-API
Marostegui added a comment to T216656: API problem with usercontribs using `rev_user_text` rather than `rev_user`: Only use 'contributions' replica if querying by user ID.

For s2 we can probably decrease the main traffic weight for the rc replicas (db1103 and db1105) as the other hosts I think will have no problem to assume the traffic, but this is another case where "special" slaves are a snowflake and bite us :-(

Feb 21 2019, 6:25 AM · Platform Team Workboards (Done with CPT), MW-1.33-notes (1.33.0-wmf.19; 2019-02-26), DBA, MediaWiki-Action-API
Marostegui added a comment to T51199: Add index log_type_action.

I don't think this is too useful https://phabricator.wikimedia.org/P8114 :(

Feb 21 2019, 6:15 AM · MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Schema-change-in-production, MediaWiki-libs-Rdbms, Schema-change
Marostegui moved T216670: Degraded RAID on db2050 from Triage to In progress on the DBA board.
Feb 21 2019, 6:02 AM · DBA, SRE, ops-codfw
Marostegui assigned T216670: Degraded RAID on db2050 to Papaul.

Let's get the disk changed @Papaul - thanks!
Let's replace only the one that has FAILED, not the ones with predictive failure, those are being tracked at T208323: Predictive failures on disk S.M.A.R.T. status

Feb 21 2019, 6:02 AM · DBA, SRE, ops-codfw

Feb 20 2019

Marostegui added a comment to T216635: MySQL database on deployment-db03 does not start due to InnoDB issue.

Data looks very corrupted. At this point the best option is to rebuild that host from the slave.

Feb 20 2019, 8:45 PM · Release-Engineering-Team, DBA, Beta-Cluster-Infrastructure
Marostegui added a comment to P8111 innodb-force-recovery=1 on deployment-db03 for T216635.

Data looks very corrupted. At this point the best option is to rebuild that host from the slave

Feb 20 2019, 8:44 PM
Marostegui added a comment to T216635: MySQL database on deployment-db03 does not start due to InnoDB issue.

Anything on dmesg?
Can you do a touch /srv/test?

Feb 20 2019, 5:36 PM · Release-Engineering-Team, DBA, Beta-Cluster-Infrastructure
Marostegui added a comment to T51199: Add index log_type_action.

It is certainly being used for some queries, I can see this counter increasing:

root@db1089.eqiad.wmnet[sys]> select rows_selected,select_latency from x$schema_index_statistics where table_name='logging' and index_name like 'log_title_%';
+---------------+----------------+
| rows_selected | select_latency |
+---------------+----------------+
|         14478 |  2739429620928 |
|           225 |    98277884848 |
+---------------+----------------+
2 rows in set (0.05 sec)
Feb 20 2019, 5:24 PM · MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Schema-change-in-production, MediaWiki-libs-Rdbms, Schema-change
Marostegui added a comment to T216635: MySQL database on deployment-db03 does not start due to InnoDB issue.

Broken storage?:

Feb 18 13:24:54 mysqld[837]: InnoDB: Error number 5 means 'Input/output error'.
Feb 20 2019, 5:24 PM · Release-Engineering-Team, DBA, Beta-Cluster-Infrastructure
Marostegui created P8108 (An Untitled Masterwork).
Feb 20 2019, 2:38 PM
Marostegui added a comment to T201133: db1069 (x1 master) memory errors.

Just for the record

db1069
Feb 20 2019, 10:57 AM · ops-eqiad, SRE, DBA
Marostegui updated the task description for T210713: Drop change_tag.ct_tag column in production.
Feb 20 2019, 10:05 AM · Schema-change-in-production, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui added a comment to T210713: Drop change_tag.ct_tag column in production.

s3 eqiad

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1004
  • dbstore1002
  • db1124
  • db1123
  • db1095
  • db1077
  • db1075
  • db1078
Feb 20 2019, 10:04 AM · Schema-change-in-production, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui added a comment to T210713: Drop change_tag.ct_tag column in production.

db1067 (s1 master) has too much concurrency to let the alter go thru, I will try a few more times before giving up on it and leaving it for when we either failover the master or the DC.

Feb 20 2019, 10:02 AM · Schema-change-in-production, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui updated the task description for T210713: Drop change_tag.ct_tag column in production.
Feb 20 2019, 9:55 AM · Schema-change-in-production, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui updated the task description for T86342: Dropping page.page_no_title_convert on wmf databases.
Feb 20 2019, 8:52 AM · Schema-change-in-production, DBA, Schema-change
Marostegui added a comment to T51199: Add index log_type_action.

I have been taking a look at these indexes on enwiki, and we have two indexes in production that are not on tables.sql:

KEY `log_title_time` (`log_title`(16),`log_timestamp`),
KEY `log_title_type_time` (`log_title`(16),`log_type`,`log_timestamp`),
Feb 20 2019, 8:33 AM · MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Schema-change-in-production, MediaWiki-libs-Rdbms, Schema-change
Marostegui moved T187295: Apply AbuseFilter patch-fix-index from Backlog to Pending comment on the DBA board.
Feb 20 2019, 7:18 AM · AbuseFilter, DBA, Schema-change-in-production