Marostegui (Manuel Aróstegui)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Sep 1 2016, 6:48 AM (119 w, 8 h)
Availability
Available
IRC Nick
marostegui
LDAP User
Marostegui
MediaWiki User
MArostegui (WMF) [ Global Accounts ]

TZ: UTC +1/+2

Recent Activity

Today

Marostegui added a comment to T211804: A huge spike on read rows for commonswiki.

Thank you so much @Anomie and @Ladsgroup!!

Thu, Dec 13, 3:14 PM · MW-1.33-notes (1.33.0-wmf.9; 2018-12-18), Core Platform Team Kanban (Waiting for Review), Patch-For-Review, DBA
Marostegui added a comment to T207258: rack/setup/install pc1007-pc1010.

@Marostegui and all,

the system board that was replaced yesterday was faulty. Showing errors on DIMM slots B4 and B1. After swapping DIMMs in B with DIMMs in A, the error remained B4 and B1. Also tried swapping CPUs but the errors remained the same. A new board has been ordered along with a couple of replacement DIMM just in case

Thu, Dec 13, 1:41 PM · Operations, ops-eqiad, DBA
Marostegui added a comment to T211804: A huge spike on read rows for commonswiki.

+2'd, I leave it to our DBAs if they think this needs to be backported or we should wait until wmf.9. If you think this needs backport, I can take care of it.

Thu, Dec 13, 12:51 PM · MW-1.33-notes (1.33.0-wmf.9; 2018-12-18), Core Platform Team Kanban (Waiting for Review), Patch-For-Review, DBA
Marostegui updated subscribers of T211804: A huge spike on read rows for commonswiki.

We had another spike on commons, same query.
https://grafana.wikimedia.org/d/000000273/mysql?panelId=3&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1084&var-port=9104&kiosk&refresh=10s&from=1544690985082&to=1544701785082

Thu, Dec 13, 11:51 AM · MW-1.33-notes (1.33.0-wmf.9; 2018-12-18), Core Platform Team Kanban (Waiting for Review), Patch-For-Review, DBA
Marostegui added a comment to T54921: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking).

@Krenair I was talking with @MarcoAurelio about the hidden table on eswikibooks and we saw it is listed here - can it be dropped everywhere?

Thu, Dec 13, 10:18 AM · Epic, DBA, Tracking
Marostegui created P7912 (An Untitled Masterwork).
Thu, Dec 13, 10:04 AM
Marostegui added a comment to T209488: Global rename of Massimo Telò → Teseo: supervision needed.

Cool - ping @Banyek on this task before you are ready to start!
Thanks!

Thu, Dec 13, 9:46 AM · DBA, Wikimedia-Site-requests
Marostegui removed a project from T110115: Possible to run writes (e.g. UPDATE) on Beta Cluster replica: DBA.
Thu, Dec 13, 9:45 AM · Operations, Beta-Cluster-Infrastructure
Marostegui added a comment to T209488: Global rename of Massimo Telò → Teseo: supervision needed.

Let's do it at 10 UTC, would that work for you? //cc @Banyek

Thu, Dec 13, 9:44 AM · DBA, Wikimedia-Site-requests
Marostegui added a comment to T209488: Global rename of Massimo Telò → Teseo: supervision needed.

@1997kB can we do this tomorrow?

Thu, Dec 13, 9:42 AM · DBA, Wikimedia-Site-requests
Marostegui closed T209761: Drop FlaggedRevs tables in database for srwikinews as Resolved.

Tables are gone:

root@db1075.eqiad.wmnet[srwikinews]> show tables like '%flagged%';
Empty set (0.00 sec)
Thu, Dec 13, 8:34 AM · MediaWiki-extensions-FlaggedRevs, DBA, Wikimedia-Site-requests, User-Zoranzoki21
Marostegui closed T209761: Drop FlaggedRevs tables in database for srwikinews, a subtask of T54921: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking), as Resolved.
Thu, Dec 13, 8:34 AM · Epic, DBA, Tracking
Marostegui closed T209761: Drop FlaggedRevs tables in database for srwikinews, a subtask of T209251: Remove FlaggedRevs and add back rights autopatrolled, patroller (with enabled RCPatrol), rollbacker on srwikinews, as Resolved.
Thu, Dec 13, 8:33 AM · Wikimedia-Site-requests, User-Zoranzoki21
Marostegui added a comment to T209761: Drop FlaggedRevs tables in database for srwikinews.

I have left a temporary backup at:

root@db1078:/srv/tmp/T209761
Thu, Dec 13, 8:22 AM · MediaWiki-extensions-FlaggedRevs, DBA, Wikimedia-Site-requests, User-Zoranzoki21
Marostegui added a comment to T209761: Drop FlaggedRevs tables in database for srwikinews.

No trace of errors on logtash for this wiki since 11th Dec. So I am going to proceed dropping these tables

Thu, Dec 13, 8:15 AM · MediaWiki-extensions-FlaggedRevs, DBA, Wikimedia-Site-requests, User-Zoranzoki21
Marostegui moved T211804: A huge spike on read rows for commonswiki from In progress to Blocked external/Not db team on the DBA board.
Thu, Dec 13, 7:33 AM · MW-1.33-notes (1.33.0-wmf.9; 2018-12-18), Core Platform Team Kanban (Waiting for Review), Patch-For-Review, DBA
Marostegui created T211850: install2002 94% disk usage on "/".
Thu, Dec 13, 6:19 AM · Operations

Yesterday

Marostegui added a comment to T211804: A huge spike on read rows for commonswiki.

Thanks @Anomie!
Let's get it merged, yes, please!

Wed, Dec 12, 7:19 PM · MW-1.33-notes (1.33.0-wmf.9; 2018-12-18), Core Platform Team Kanban (Waiting for Review), Patch-For-Review, DBA
Marostegui updated subscribers of T211774: Full table scans on oldimage table.
Wed, Dec 12, 3:32 PM · MW-1.33-notes (1.33.0-wmf.8; 2018-12-11), Patch-For-Review, Core Platform Team Kanban (Waiting for Review), MediaWiki-Database, Release, Train Deployments
Marostegui created T211774: Full table scans on oldimage table.
Wed, Dec 12, 3:31 PM · MW-1.33-notes (1.33.0-wmf.8; 2018-12-11), Patch-For-Review, Core Platform Team Kanban (Waiting for Review), MediaWiki-Database, Release, Train Deployments
Marostegui added a comment to T211769: Wikimedia\Rdbms\Database::selectSQLText called from ApiBase::filterIDs with incorrect parameters: $conds must be a string or an array.

@Anomie

root@db1084.eqiad.wmnet[sys]> select  query,first_seen,last_seen,total_latency,exec_count,rows_examined,rows_sent from x$statements_with_full_table_scans where first_seen like '2018-12-12%' and query not like '%statements_with_full_table_scans%'\G
*************************** 1. row ***************************
        query: SELECT `oi_timestamp` AS `img_timestamp` , `oi_name` AS `img_name` , `oi_name` AS `thumb` , `oi_size` AS `img_size` , ? AS `top` , COALESCE ( `comment_oi_description` . `comment_text` , `oi_description` ) AS `oi_description_text` , `comment_oi_description` . `comment_data` AS `oi_description_data` , `comment_oi_description` . `comment_id` AS `oi_description_cid` , ? AS `description_field` , `oi_user` AS `img_user` , `oi_user_text` AS `img_user_text` , ? AS `img_actor` FROM `oldimage` LEFT JOIN `comment` `comment_oi_description` ON ( ( `comment_oi_description` . `comment_id` = `oi_description_id` ) ) WHERE ( ( `oi_user` = ? ) ) AND `oi_deleted` = ? ORDER BY `oi_timestamp` DESC LIMIT ?
   first_seen: 2018-12-12 14:15:53
    last_seen: 2018-12-12 14:28:56
total_latency: 130627767982000
   exec_count: 25
rows_examined: 110432980
    rows_sent: 877
*************************** 2. row ***************************
        query: SELECT `oi_timestamp` AS `img_timestamp` , `oi_name` AS `img_name` , `oi_name` AS `thumb` , `oi_size` AS `img_size` , ? AS `top` , COALESCE ( `comment_oi_description` . `comment_text` , `oi_description` ) AS `oi_description_text` , `comment_oi_description` . `comment_data` AS `oi_description_data` , `comment_oi_description` . `comment_id` AS `oi_description_cid` , ? AS `description_field` , `oi_user` AS `img_user` , `oi_user_text` AS `img_user_text` , ? AS `img_actor` FROM `oldimage` LEFT JOIN `comment` `comment_oi_description` ON ( ( `comment_oi_description` . `comment_id` = `oi_description_id` ) ) WHERE ( ( `oi_user` = ? ) ) AND `oi_deleted` = ? AND ( `oi_timestamp` >= ? ) ORDER BY `oi_timestamp` LIMIT ?
   first_seen: 2018-12-12 14:22:36
    last_seen: 2018-12-12 14:28:07
total_latency: 27711608943000
   exec_count: 7
rows_examined: 30920774
    rows_sent: 3
*************************** 3. row ***************************
        query: SELECT `oi_timestamp` AS `img_timestamp` , `oi_name` AS `img_name` , `oi_name` AS `thumb` , `oi_size` AS `img_size` , ? AS `top` , COALESCE ( `comment_oi_description` . `comment_text` , `oi_description` ) AS `oi_description_text` , `comment_oi_description` . `comment_data` AS `oi_description_data` , `comment_oi_description` . `comment_id` AS `oi_description_cid` , ? AS `description_field` , `oi_user` AS `img_user` , `oi_user_text` AS `img_user_text` , ? AS `img_actor` FROM `oldimage` LEFT JOIN `comment` `comment_oi_description` ON ( ( `comment_oi_description` . `comment_id` = `oi_description_id` ) ) WHERE ( ( `oi_user` = ? ) ) AND `oi_deleted` = ? AND ( `oi_timestamp` < ? ) ORDER BY `oi_timestamp` DESC LIMIT ?
   first_seen: 2018-12-12 14:22:40
    last_seen: 2018-12-12 14:28:11
total_latency: 26374348551000
   exec_count: 7
rows_examined: 30921366
    rows_sent: 299
*************************** 4. row ***************************
        query: SELECT `oi_timestamp` AS `img_timestamp` , `oi_name` AS `img_name` , `oi_name` AS `thumb` , `oi_size` AS `img_size` , ? AS `top` , COALESCE ( `comment_oi_description` . `comment_text` , `oi_description` ) AS `oi_description_text` , `comment_oi_description` . `comment_data` AS `oi_description_data` , `comment_oi_description` . `comment_id` AS `oi_description_cid` , ? AS `description_field` , `oi_user` AS `img_user` , `oi_user_text` AS `img_user_text` , ? AS `img_actor` FROM `oldimage` LEFT JOIN `comment` `comment_oi_description` ON ( ( `comment_oi_description` . `comment_id` = `oi_description_id` ) ) WHERE ( ( `oi_user` = ? ) ) AND `oi_deleted` = ? AND ( `oi_timestamp` <= ? ) ORDER BY `oi_timestamp` DESC LIMIT ?
   first_seen: 2018-12-12 14:27:52
    last_seen: 2018-12-12 14:28:02
total_latency: 7693955058000
   exec_count: 2
rows_examined: 8834514
    rows_sent: 2
*************************** 5. row ***************************
        query: SELECT `oi_timestamp` AS `img_timestamp` , `oi_name` AS `img_name` , `oi_name` AS `thumb` , `oi_size` AS `img_size` , ? AS `top` , COALESCE ( `comment_oi_description` . `comment_text` , `oi_description` ) AS `oi_description_text` , `comment_oi_description` . `comment_data` AS `oi_description_data` , `comment_oi_description` . `comment_id` AS `oi_description_cid` , ? AS `description_field` , `oi_user` AS `img_user` , `oi_user_text` AS `img_user_text` , ? AS `img_actor` FROM `oldimage` LEFT JOIN `comment` `comment_oi_description` ON ( ( `comment_oi_description` . `comment_id` = `oi_description_id` ) ) WHERE ( ( `oi_user` = ? ) ) AND `oi_deleted` = ? AND ( `oi_timestamp` > ? ) ORDER BY `oi_timestamp` LIMIT ?
   first_seen: 2018-12-12 14:27:56
    last_seen: 2018-12-12 14:28:05
total_latency: 7299142771000
   exec_count: 2
rows_examined: 8834714
    rows_sent: 102
*************************** 6. row ***************************
        query: SELECT `oi_timestamp` AS `img_timestamp` , `oi_name` AS `img_name` , `oi_name` AS `thumb` , `oi_size` AS `img_size` , ? AS `top` , COALESCE ( `comment_oi_description` . `comment_text` , `oi_description` ) AS `oi_description_text` , `comment_oi_description` . `comment_data` AS `oi_description_data` , `comment_oi_description` . `comment_id` AS `oi_description_cid` , ? AS `description_field` , `oi_user` AS `img_user` , `oi_user_text` AS `img_user_text` , ? AS `img_actor` FROM `oldimage` LEFT JOIN `comment` `comment_oi_description` ON ( ( `comment_oi_description` . `comment_id` = `oi_description_id` ) ) WHERE ( ( `oi_user` = ? ) ) AND `oi_deleted` = ? ORDER BY `oi_timestamp` LIMIT ?
   first_seen: 2018-12-12 14:27:39
    last_seen: 2018-12-12 14:27:39
total_latency: 3750216586000
   exec_count: 1
rows_examined: 4417357
    rows_sent: 51
*************************** 7. row ***************************
        query: SELECT * FROM `schema_tables_with_full_table_scans`
   first_seen: 2018-12-12 14:57:55
    last_seen: 2018-12-12 14:57:55
total_latency: 42163868000
   exec_count: 1
rows_examined: 610
    rows_sent: 31
7 rows in set (0.03 sec)
Wed, Dec 12, 3:24 PM · MW-1.33-notes (1.33.0-wmf.9; 2018-12-18), Core Platform Team Kanban (Waiting for Review), MediaWiki-API, Patch-For-Review, MediaWiki-General-or-Unknown
Marostegui created P7910 (An Untitled Masterwork).
Wed, Dec 12, 3:16 PM
Marostegui added a comment to T211769: Wikimedia\Rdbms\Database::selectSQLText called from ApiBase::filterIDs with incorrect parameters: $conds must be a string or an array.

Seems unlikely to me. Is there a separate task with details on what queries were doing full scans?

Wed, Dec 12, 3:02 PM · MW-1.33-notes (1.33.0-wmf.9; 2018-12-18), Core Platform Team Kanban (Waiting for Review), MediaWiki-API, Patch-For-Review, MediaWiki-General-or-Unknown
Marostegui updated subscribers of T211769: Wikimedia\Rdbms\Database::selectSQLText called from ApiBase::filterIDs with incorrect parameters: $conds must be a string or an array.

The reason we reverted was the spike on full scans, which we don't know yet if it is or not related to this: https://grafana.wikimedia.org/render/d-solo/000000273/mysql?panelId=3&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1084&var-port=9104&kiosk&refresh=10s&from=1544614804395&to=1544625604395&width=1000&height=500&tz=Europe%2FMadrid

Wed, Dec 12, 2:40 PM · MW-1.33-notes (1.33.0-wmf.9; 2018-12-18), Core Platform Team Kanban (Waiting for Review), MediaWiki-API, Patch-For-Review, MediaWiki-General-or-Unknown
Marostegui moved T116793: Investigate slow query logging/digest for Beta Cluster from Triage to Backlog on the DBA board.
Wed, Dec 12, 9:39 AM · DBA, Beta-Cluster-Infrastructure
Marostegui claimed T210713: Drop change_tag.ct_tag column in production.
Wed, Dec 12, 9:18 AM · Blocked-on-schema-change, User-Ladsgroup, MediaWiki-Change-tagging
Marostegui added a comment to T202167: Schema change for rc_this_oldid index.

s7 eqiad progress

Wed, Dec 12, 8:21 AM · Patch-For-Review, DBA, Blocked-on-schema-change, Scoring-platform-team, User-Ladsgroup, MediaWiki-Database
Marostegui updated the task description for T202167: Schema change for rc_this_oldid index.
Wed, Dec 12, 8:21 AM · Patch-For-Review, DBA, Blocked-on-schema-change, Scoring-platform-team, User-Ladsgroup, MediaWiki-Database
Marostegui added a comment to T86338: Dropping page.page_counter on wmf databases.

s7 eqiad progress

Wed, Dec 12, 8:21 AM · Patch-For-Review, Blocked-on-schema-change, DBA, Schema-change
Marostegui updated the task description for T86338: Dropping page.page_counter on wmf databases.
Wed, Dec 12, 8:20 AM · Patch-For-Review, Blocked-on-schema-change, DBA, Schema-change
Marostegui updated the task description for T202167: Schema change for rc_this_oldid index.
Wed, Dec 12, 6:36 AM · Patch-For-Review, DBA, Blocked-on-schema-change, Scoring-platform-team, User-Ladsgroup, MediaWiki-Database
Marostegui updated the task description for T86338: Dropping page.page_counter on wmf databases.
Wed, Dec 12, 6:35 AM · Patch-For-Review, Blocked-on-schema-change, DBA, Schema-change
Marostegui reassigned T209858: Decommission parsercache hosts: pc2004 pc2005 pc2006 (Dec 2018 lease return) from Papaul to RobH.
Wed, Dec 12, 6:14 AM · Patch-For-Review, decommission, Operations, ops-codfw, DBA
Marostegui moved T207253: Compare a few tables per section between hosts and DC from In progress to Next on the DBA board.
Wed, Dec 12, 6:11 AM · Patch-For-Review, User-Banyek, Wikimedia-Incident, DBA
Marostegui added a comment to T210992: Increase parsercache keys TTL from 22 days back to 30 days.

Thanks @aaron - I will get a patch out after the code freeze

Wed, Dec 12, 6:10 AM · Operations, Performance-Team, DBA
Marostegui added a comment to T209761: Drop FlaggedRevs tables in database for srwikinews.

Thanks @Reedy!

root@cumin1001:~# for i in flaggedtemplates flaggedpage_config flaggedimages flaggedpage_pending flaggedpages ; do mysql.py -hdb1078 srwikinews -e "set session sql_log_bin=0; rename table $i to T209761_$i;";done
root@db1078.eqiad.wmnet[srwikinews]> show tables like '%flagged%';
+----------------------------------+
| Tables_in_srwikinews (%flagged%) |
+----------------------------------+
| T209761_flaggedimages            |
| T209761_flaggedpage_config       |
| T209761_flaggedpage_pending      |
| T209761_flaggedpages             |
| T209761_flaggedrevs              |
| T209761_flaggedrevs_promote      |
| T209761_flaggedrevs_statistics   |
| T209761_flaggedrevs_stats        |
| T209761_flaggedrevs_stats2       |
| T209761_flaggedrevs_tracking     |
| T209761_flaggedtemplates         |
+----------------------------------+
11 rows in set (0.00 sec)
Wed, Dec 12, 6:09 AM · MediaWiki-extensions-FlaggedRevs, DBA, Wikimedia-Site-requests, User-Zoranzoki21
Marostegui added a comment to T209858: Decommission parsercache hosts: pc2004 pc2005 pc2006 (Dec 2018 lease return).

After those last merges, is this good to be closed? @Papaul @RobH?
Thanks!

Wed, Dec 12, 6:06 AM · Patch-For-Review, decommission, Operations, ops-codfw, DBA

Tue, Dec 11

Marostegui closed T211210: labsdb1004 replication broken for linkwatcher_linklog table as Resolved.

labsdb1004 finally caught up and so far everything is fine.
The table seems to be in sync.

for i in labsdb1004 labsdb1005; do echo $i; mysql.py -h$i s51230__linkwatcher -e " select max(id) from linkwatcher_linklog;";done
labsdb1004
+-----------+
| max(id)   |
+-----------+
| 574664355 |
+-----------+
labsdb1005
+-----------+
| max(id)   |
+-----------+
| 574664355 |
+-----------+
Tue, Dec 11, 8:37 AM · DBA
Marostegui claimed T209761: Drop FlaggedRevs tables in database for srwikinews.

I have renamed the tables on db1078 for now, to make sure nothing really breaks.
Renamed then on db1078 only:

root@cumin1001:/home/marostegui# for i in flaggedrevs flaggedrevs_promote flaggedrevs_statistics flaggedrevs_stats flaggedrevs_stats2 flaggedrevs_tracking; do mysql.py -hdb1078 srwikinews -e "set session sql_log_bin=0; rename table $i to T209761_$i;";done
Tue, Dec 11, 6:47 AM · MediaWiki-extensions-FlaggedRevs, DBA, Wikimedia-Site-requests, User-Zoranzoki21
Marostegui moved T211613: rack/setup/install db11[26-38].eqiad.wmnet from Triage to Blocked external/Not db team on the DBA board.
Tue, Dec 11, 6:25 AM · Patch-For-Review, DBA, ops-eqiad, User-Marostegui, Operations
Marostegui added a comment to T211613: rack/setup/install db11[26-38].eqiad.wmnet.

Hey @RobH!
Thanks for putting up an initial racking plan.

Tue, Dec 11, 6:25 AM · Patch-For-Review, DBA, ops-eqiad, User-Marostegui, Operations
Marostegui updated the task description for T86338: Dropping page.page_counter on wmf databases.
Tue, Dec 11, 6:05 AM · Patch-For-Review, Blocked-on-schema-change, DBA, Schema-change
Marostegui updated the task description for T202167: Schema change for rc_this_oldid index.
Tue, Dec 11, 6:05 AM · Patch-For-Review, DBA, Blocked-on-schema-change, Scoring-platform-team, User-Ladsgroup, MediaWiki-Database
Marostegui updated the task description for T86338: Dropping page.page_counter on wmf databases.
Tue, Dec 11, 6:04 AM · Patch-For-Review, Blocked-on-schema-change, DBA, Schema-change
Marostegui added a comment to T86338: Dropping page.page_counter on wmf databases.

db1068 (s4 master) has too much concurrency on the page table to be able to alter it live (metada locking issues)

Tue, Dec 11, 6:04 AM · Patch-For-Review, Blocked-on-schema-change, DBA, Schema-change
Marostegui updated the task description for T202167: Schema change for rc_this_oldid index.
Tue, Dec 11, 6:00 AM · Patch-For-Review, DBA, Blocked-on-schema-change, Scoring-platform-team, User-Ladsgroup, MediaWiki-Database

Mon, Dec 10

Marostegui added a comment to T211210: labsdb1004 replication broken for linkwatcher_linklog table.

I think I have replayed all the transactions that were skipped during that time. Of course, going thru binlogs is hard and tedious and I might have missed transactions, so far replication is flowing again.
We'll see in the next few hours/days once DELETEs or UPDATEs arrive to purge old rows.

Mon, Dec 10, 8:02 PM · DBA
Marostegui added a comment to T211210: labsdb1004 replication broken for linkwatcher_linklog table.

I was thinking we should let it catch up, and then redo the import, but with mydumper instead of mysqldump as we were talking about that
Also it's weird why it happened again

Mon, Dec 10, 6:57 PM · DBA
Marostegui added a comment to T211210: labsdb1004 replication broken for linkwatcher_linklog table.

aand after I restarted the instance, I've got:

Last_SQL_Error: Could not execute Write_rows_v1 event on table s51230__linkwatcher.linkwatcher_linklog; Duplicate entry '573509232' for key 'ID', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log log.235715, end_log_pos 90204102

I add the table again to the ignore list, and restart replication, we'll see what to do after it catched up

Mon, Dec 10, 6:36 PM · DBA
Marostegui updated the task description for T211593: Grant addshore access to test-s4 servers.
Mon, Dec 10, 3:04 PM · DBA
Marostegui closed T211593: Grant addshore access to test-s4 servers as Resolved.
Mon, Dec 10, 2:57 PM · DBA
Marostegui added a comment to T208622: Import recommendations into production database.

Why not using mwmaint1002 for the import?
I just checked and it can reach m2-master fine.

Mon, Dec 10, 11:15 AM · Analytics, User-Banyek, Patch-For-Review, Operations, Research
Marostegui added a comment to T86338: Dropping page.page_counter on wmf databases.

s8 eqiad progress

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1002
  • db1124
  • db1116
  • db1109
  • db1104
  • db1101
  • db1099
  • db1092
  • db1087
  • db1071
Mon, Dec 10, 10:22 AM · Patch-For-Review, Blocked-on-schema-change, DBA, Schema-change
Marostegui added a comment to T202167: Schema change for rc_this_oldid index.

s8 eqiad progress

  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1002
  • db1124
  • db1116
  • db1109
  • db1104
  • db1101
  • db1099
  • db1092
  • db1087
  • db1071
Mon, Dec 10, 10:22 AM · Patch-For-Review, DBA, Blocked-on-schema-change, Scoring-platform-team, User-Ladsgroup, MediaWiki-Database
Marostegui updated the task description for T86338: Dropping page.page_counter on wmf databases.
Mon, Dec 10, 10:04 AM · Patch-For-Review, Blocked-on-schema-change, DBA, Schema-change
Marostegui updated the task description for T202167: Schema change for rc_this_oldid index.
Mon, Dec 10, 10:04 AM · Patch-For-Review, DBA, Blocked-on-schema-change, Scoring-platform-team, User-Ladsgroup, MediaWiki-Database
Marostegui moved T209761: Drop FlaggedRevs tables in database for srwikinews from Blocked external/Not db team to Next on the DBA board.

Thanks!

Mon, Dec 10, 7:14 AM · MediaWiki-extensions-FlaggedRevs, DBA, Wikimedia-Site-requests, User-Zoranzoki21
Marostegui triaged T209761: Drop FlaggedRevs tables in database for srwikinews as Normal priority.

@Zoranzoki21 can you confirm if this is good to go and if these are the tables that need to be dropped?

root@db1077.eqiad.wmnet[srwikinews]> show tables like '%flaggedre%';
+------------------------------------+
| Tables_in_srwikinews (%flaggedre%) |
+------------------------------------+
| flaggedrevs                        |
| flaggedrevs_promote                |
| flaggedrevs_statistics             |
| flaggedrevs_stats                  |
| flaggedrevs_stats2                 |
| flaggedrevs_tracking               |
+------------------------------------+
6 rows in set (0.00 sec)
Mon, Dec 10, 6:37 AM · MediaWiki-extensions-FlaggedRevs, DBA, Wikimedia-Site-requests, User-Zoranzoki21
Marostegui updated the task description for T208323: Predictive failures on disk S.M.A.R.T. status.
Mon, Dec 10, 6:35 AM · Operations, DBA
Marostegui closed T210749: Hardware for cloud db replicas for analytics usage as Resolved.

Closing this as the hardware has been decided to be purchased and will be followed up at: {T211135}

Mon, Dec 10, 6:35 AM · User-Banyek, Data-Services, User-Elukey, DBA, Analytics
Marostegui closed T202889: cloudvps: dedicated openstack database as Resolved.

Closing this for now per T202889#4798131
If someone feels we need to revisit this, please re-open!
Thanks everyone

Mon, Dec 10, 6:34 AM · cloud-services-team (Kanban), Patch-For-Review, DBA
Marostegui moved T211537: Degraded RAID on db1063 from Triage to In progress on the DBA board.
Mon, Dec 10, 6:33 AM · DBA, ops-eqiad, Operations
Marostegui assigned T211537: Degraded RAID on db1063 to Cmjohnson.

@Cmjohnson I am setting this to high priority because there is one failed disk and another one with smart errors (on a different SPAN).
Let's replace only the failed one.
This is m1 master, so let's do this as soon as possible.

Mon, Dec 10, 6:32 AM · DBA, ops-eqiad, Operations
Marostegui closed T211338: Make a copy of the current wb_terms table on the MCR testing DB servers as Resolved.

@Addshore wb_terms has been imported into db1111 (and replicated to db1112).
Please check that you have access and if not talk to me privately on IRC:

Mon, Dec 10, 6:27 AM · Wikidata, DBA

Sun, Dec 9

Marostegui updated subscribers of T211512: "sql" command fails with "sh: 1: mysql: not found" on mwdebug1002.

I don't know if it used to work on the old hosts.
MySQL client isn't installed there - not sure if it used to be and "got lost" when migrating to stretch.

Sun, Dec 9, 7:32 PM · Patch-For-Review, Operations

Fri, Dec 7

Marostegui added a project to T211442: Database errors during MovePage operations for file moves causes data loss.: MediaWiki-Database.
Fri, Dec 7, 9:14 PM · MW-1.33-notes (1.33.0-wmf.8; 2018-12-11), MediaWiki-Database, Patch-For-Review, Multimedia, MediaWiki-File-management, Commons
Marostegui added a comment to T208622: Import recommendations into production database.

@Marostegui

Is this going to be a one time import?

Maybe a 4-5 time import, maybe less. For now we have all the data needed in MySQL (thanks to Andrew).

Why does it need to be done from stat1007? They should connect to m2-master will route them thru the proxy.

Because I have access to stat1007 and not neodymium, for example. The data is also in stat1007 (although we can share download it to other machines too). I can try m2-master, but I'm not sure if I have access to it either.

Fri, Dec 7, 6:24 PM · Analytics, User-Banyek, Patch-For-Review, Operations, Research
Marostegui added a comment to T208622: Import recommendations into production database.

Is this going to be a one time import?
How much data will be imported?

Fri, Dec 7, 3:35 PM · Analytics, User-Banyek, Patch-For-Review, Operations, Research
Marostegui added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

To duplicate something like check_private_data in Hadoop, I'd guess a day to write it and a couple of days to review and test it. So probably like a week to get everything deployed and integrated with the current job. We have to change some other jobs too to make them depend on this check.

Unlike the cloud replicas, we don't replicate and publish everything, we only pull a specific set of fields from a specific set of tables. So I don't see how something would sneak by, but if it makes you feel better, I'm happy to spend the week and get it done :)

Fri, Dec 7, 3:22 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Marostegui moved T119371: Enable MariaDB/MySQL strict mode on CI db hosts from Triage to Blocked external/Not db team on the DBA board.
Fri, Dec 7, 2:59 PM · Release-Engineering-Team (Kanban), Quibble, MediaWiki-Database, DBA, Continuous-Integration-Infrastructure
Marostegui moved T101502: Slow API list=recentchanges query (DBError: Lost connection to MySQL server during query) from Blocked external/Not db team to Backlog on the DBA board.
Fri, Dec 7, 2:58 PM · DBA, Wikimedia-production-error, MediaWiki-API, Pywikibot-tests, Pywikibot
Marostegui moved T101502: Slow API list=recentchanges query (DBError: Lost connection to MySQL server during query) from Triage to Blocked external/Not db team on the DBA board.
Fri, Dec 7, 2:58 PM · DBA, Wikimedia-production-error, MediaWiki-API, Pywikibot-tests, Pywikibot
Marostegui added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

Thanks for the detailed analysis.
My proposal was more in the line to check that the JOIN is always safe.
On the sanitarium+labs side we do have 4 kind of "security" (note the " "!) layers

Fri, Dec 7, 2:43 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Marostegui removed a project from T141765: Error: 2013 Lost connection to MySQL server during query on NewFilesPager (again): DBA.
Fri, Dec 7, 2:25 PM · Growth-Team, MediaWiki-Database, MediaWiki-Recent-changes
Marostegui added a comment to T211210: labsdb1004 replication broken for linkwatcher_linklog table.

You could've used compare.py, it might have taken several hours though.

so far I just used it to see if there are diffs, not for finding those

Fri, Dec 7, 10:44 AM · DBA
Marostegui added a comment to T211210: labsdb1004 replication broken for linkwatcher_linklog table.

You could've used compare.py, it might have taken several hours though.

Fri, Dec 7, 9:52 AM · DBA
Marostegui removed a project from T208231: Issues with purgeUnusedProjects.php cron job on mwmaint1002 (Fri Oct 26): DBA.
Fri, Dec 7, 8:26 AM · Community-Tech, Performance, MediaWiki-extensions-PageAssessments, User-Banyek, Operations
Marostegui added a comment to T208231: Issues with purgeUnusedProjects.php cron job on mwmaint1002 (Fri Oct 26).

i'd like to add the owner of the script as a subscriber, but I don't know how to find who is it

Fri, Dec 7, 8:14 AM · Community-Tech, Performance, MediaWiki-extensions-PageAssessments, User-Banyek, Operations
Marostegui renamed T209521: replication broken on db1124:3318 on wikidata.pagelinks from replication broken on db1124 to replication broken on db1124:3318 on wikidata.pagelinks.
Fri, Dec 7, 7:39 AM · DBA
Marostegui moved T211338: Make a copy of the current wb_terms table on the MCR testing DB servers from Backlog to In progress on the DBA board.
Fri, Dec 7, 7:30 AM · Wikidata, DBA
Marostegui claimed T211338: Make a copy of the current wb_terms table on the MCR testing DB servers.

As we discussed, this is a one time thing we happen to be able to do now, but this cannot be guaranteed in the future.
I will restore wb_terms from the backups, it will take quite some time.

Fri, Dec 7, 7:30 AM · Wikidata, DBA
Marostegui added a comment to T210749: Hardware for cloud db replicas for analytics usage .

@Nuria do you think we should close this as it is decided we'll go for a host with the same specs and config than the rest of clouds replicas (T211135)?

Fri, Dec 7, 6:59 AM · User-Banyek, Data-Services, User-Elukey, DBA, Analytics
Marostegui added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

Thanks very much @Anomie, I understand my misunderstanding, and your third answer is what I was asking.

Fri, Dec 7, 6:45 AM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Marostegui added a comment to T207258: rack/setup/install pc1007-pc1010.

Thanks for the heads up @Cmjohnson!

Fri, Dec 7, 6:39 AM · Operations, ops-eqiad, DBA
Marostegui added a comment to T209521: replication broken on db1124:3318 on wikidata.pagelinks.

This broke again yesterday:

Dec 06 18:10:48 db1124 mysqld[3110]: 2018-12-06 18:10:48 140545623860992 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table wikidatawiki.pagelinks; Can't find record in 'pagelinks', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master l
Dec 06 18:10:48 db1124 mysqld[3110]: 2018-12-06 18:10:48 140545623860992 [Warning] Slave: Can't find record in 'pagelinks' Error_code: 1032
Dec 06 18:10:48 db1124 mysqld[3110]: 2018-12-06 18:10:48 140545623860992 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'db1087-bin.003545' position 943920625
Dec 06 18:10:48 db1124 mysqld[3110]: 2018-12-06 18:10:48 140545623860992 [Note] Slave SQL thread exiting, replication stopped in log 'db1087-bin.003545' at position 943920625
Fri, Dec 7, 6:27 AM · DBA
Marostegui closed T211405: pc2007.codfw.wmnet network blip? as Resolved.

See IRC/internal channel, during the setup of logstash2001.codfw.wmnet it accidentally reused the 10.192.0.104 A record.

Fri, Dec 7, 6:20 AM · Operations, netops
Marostegui created T211405: pc2007.codfw.wmnet network blip?.
Fri, Dec 7, 6:15 AM · Operations, netops

Thu, Dec 6

Marostegui raised the priority of T211210: labsdb1004 replication broken for linkwatcher_linklog table from Normal to High.
Thu, Dec 6, 6:52 PM · DBA
Marostegui added a comment to T211210: labsdb1004 replication broken for linkwatcher_linklog table.

You should use root to run the script.
If the table is that big you can either reimport it entirely or just diff the table and fix the inconsistencies manually.

Thu, Dec 6, 12:56 PM · DBA

Wed, Dec 5

Marostegui added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

I am not sure it makes sense to continue that approach.
We have more than 900 wikis, even if it takes 5GB average per wiki to for that materialized table...we don't have enough space for creating all those tables.

Wed, Dec 5, 8:15 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Marostegui added a comment to T211210: labsdb1004 replication broken for linkwatcher_linklog table.

From what I can see it is part of a huge transaction - just ignore that table on the replication filters, let it replicate until it is in sync again and then reimport the table from the master.
We have a script for it, check reimport_from_master.sh on labsdb1004 on my /home.

Wed, Dec 5, 6:09 PM · DBA
Marostegui added a comment to T207258: rack/setup/install pc1007-pc1010.

Awesome! Thank you @Cmjohnson! :)
If you get it online today, reminder: RAID5 with 256 stripe! (Reminding it because it is not the usual config)

Wed, Dec 5, 4:29 PM · Operations, ops-eqiad, DBA
Marostegui added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

We don't sanitize anything on actor or comment on a triggers level on sanitarium if that is what you ask.

*************************** 1. row ***************************
             Trigger: abuse_filter_log_insert
               Event: INSERT
               Table: abuse_filter_log
           Statement: SET NEW.afl_ip = ''
              Timing: BEFORE
             Created: NULL
            sql_mode: IGNORE_BAD_TABLE_OPTIONS
             Definer: root@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: binary
*************************** 2. row ***************************
             Trigger: abuse_filter_log_update
               Event: UPDATE
               Table: abuse_filter_log
           Statement: SET NEW.afl_ip = ''
              Timing: BEFORE
             Created: NULL
            sql_mode: IGNORE_BAD_TABLE_OPTIONS
             Definer: root@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: binary
*************************** 3. row ***************************
             Trigger: archive_insert
               Event: INSERT
               Table: archive
           Statement: SET NEW.ar_comment = '', NEW.ar_comment_id = 0
              Timing: BEFORE
             Created: NULL
            sql_mode: IGNORE_BAD_TABLE_OPTIONS
             Definer: root@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: binary
*************************** 4. row ***************************
             Trigger: archive_update
               Event: UPDATE
               Table: archive
           Statement: SET NEW.ar_comment = '', NEW.ar_comment_id = 0
              Timing: BEFORE
             Created: NULL
            sql_mode: IGNORE_BAD_TABLE_OPTIONS
             Definer: root@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: binary
*************************** 5. row ***************************
             Trigger: ep_courses_insert
               Event: INSERT
               Table: ep_courses
           Statement: SET NEW.course_token = ''
              Timing: BEFORE
             Created: NULL
            sql_mode: IGNORE_BAD_TABLE_OPTIONS
             Definer: root@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: binary
*************************** 6. row ***************************
             Trigger: ep_courses_update
               Event: UPDATE
               Table: ep_courses
           Statement: SET NEW.course_token = ''
              Timing: BEFORE
             Created: NULL
            sql_mode: IGNORE_BAD_TABLE_OPTIONS
             Definer: root@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: binary
*************************** 7. row ***************************
             Trigger: recentchanges_insert
               Event: INSERT
               Table: recentchanges
           Statement: SET NEW.rc_ip = ''
              Timing: BEFORE
             Created: NULL
            sql_mode: IGNORE_BAD_TABLE_OPTIONS
             Definer: root@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: binary
*************************** 8. row ***************************
             Trigger: recentchanges_update
               Event: UPDATE
               Table: recentchanges
           Statement: SET NEW.rc_ip = ''
              Timing: BEFORE
             Created: NULL
            sql_mode: IGNORE_BAD_TABLE_OPTIONS
             Definer: root@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: binary
*************************** 9. row ***************************
             Trigger: revision_insert
               Event: INSERT
               Table: revision
           Statement: SET NEW.rev_text_id = 0
              Timing: BEFORE
             Created: NULL
            sql_mode: IGNORE_BAD_TABLE_OPTIONS
             Definer: root@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: binary
*************************** 10. row ***************************
             Trigger: revision_update
               Event: UPDATE
               Table: revision
           Statement: SET NEW.rev_text_id = 0
              Timing: BEFORE
             Created: NULL
            sql_mode: IGNORE_BAD_TABLE_OPTIONS
             Definer: root@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: binary
*************************** 11. row ***************************
             Trigger: user_insert
               Event: INSERT
               Table: user
           Statement: SET NEW.user_password = '', NEW.user_newpassword = '', NEW.user_email = '', NEW.user_options = '', NEW.user_token = '', NEW.user_email_authenticated = '', NEW.user_email_token = '', NEW.user_email_token_expires = '', NEW.user_newpass_time = ''
              Timing: BEFORE
             Created: NULL
            sql_mode: IGNORE_BAD_TABLE_OPTIONS
             Definer: root@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: binary
*************************** 12. row ***************************
             Trigger: user_update
               Event: UPDATE
               Table: user
           Statement: SET NEW.user_password = '', NEW.user_newpassword = '', NEW.user_email = '', NEW.user_options = '', NEW.user_token = '', NEW.user_email_authenticated = '', NEW.user_email_token = '', NEW.user_email_token_expires = '', NEW.user_newpass_time = ''
              Timing: BEFORE
             Created: NULL
            sql_mode: IGNORE_BAD_TABLE_OPTIONS
             Definer: root@localhost
character_set_client: utf8
collation_connection: utf8_general_ci
  Database Collation: binary
Wed, Dec 5, 3:53 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Marostegui created T211210: labsdb1004 replication broken for linkwatcher_linklog table.
Wed, Dec 5, 3:12 PM · DBA
Marostegui added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

Thanks for the clarification - I just wanted to know if there was some specific reason for it that I might have missed.
Once the test is done make sure to either clean it up or move it to the views DB, as that is where the labsdbuser has access to and to keep it consistent. (views on one db and underlying core tables on the other)

Wed, Dec 5, 1:42 PM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Marostegui added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

Ah, I see, labsdb1010 but why is the view comment_view_temp on enwiki and not on enwiki_p where the views live? Is there any specific reason?

Wed, Dec 5, 11:46 AM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Marostegui added a comment to T210693: Create materialized views on Wiki Replica hosts for better query performance.

On which host and on which database?

Wed, Dec 5, 11:43 AM · Patch-For-Review, User-Banyek, Core Platform Team Backlog (Watching / External), Analytics-Kanban, DBA, Data-Services, Analytics
Marostegui added a comment to T197486: prop=revisions API timing out for a specific user and pages they edited.

And after upgrading db1090:3317, it gets fixed:

root@db1090.eqiad.wmnet[eswiki]> select @@hostname; select version();
+------------+
| @@hostname |
+------------+
| db1090     |
+------------+
1 row in set (0.00 sec)
Wed, Dec 5, 10:52 AM · DBA, MediaWiki-Database, MediaWiki-API
Marostegui added a comment to T197486: prop=revisions API timing out for a specific user and pages they edited.

Also happens on eswiki (s7 - the biggest revision table of all those wikis on s7, 44GB) (db1079 - 10.1.33):

root@db1079.eqiad.wmnet[eswiki]> explain SELECT * FROM revision WHERE (rev_timestamp < '20180613142145' OR rev_timestamp = '20180613142145' AND rev_id <= 845694425) AND rev_page = 6097020 AND NOT(rev_user = 2619640)  ORDER BY rev_timestamp DESC,rev_id DESC LIMIT 101;
+------+-------------+----------+-------+-------------------------------------------------------------------------------------+----------------+---------+------+-----------+-------------+
| id   | select_type | table    | type  | possible_keys                                                                       | key            | key_len | ref  | rows      | Extra       |
+------+-------------+----------+-------+-------------------------------------------------------------------------------------+----------------+---------+------+-----------+-------------+
|    1 | SIMPLE      | revision | index | PRIMARY,rev_timestamp,page_timestamp,user_timestamp,page_user_timestamp,rev_page_id | page_timestamp | 20      | NULL | 100378501 | Using where |
+------+-------------+----------+-------+-------------------------------------------------------------------------------------+----------------+---------+------+-----------+-------------+
Wed, Dec 5, 10:32 AM · DBA, MediaWiki-Database, MediaWiki-API
Marostegui added a comment to T197486: prop=revisions API timing out for a specific user and pages they edited.

Looks like this only happens on enwiki (I guess because of the size of the revision table there).
I have upgraded db1080 to 10.1.37:

Wed, Dec 5, 9:43 AM · DBA, MediaWiki-Database, MediaWiki-API